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This  paper  presents  an  alternate  translation  paradigm  -  abstraction 
and  reimplementation.  Using  this  paradigm,  the  source  program  is  first 
analyzed  in  order  to  obtain  a  programming  language  independent,  abstract 
understanding  of  the  computation  performed  by  the  program  as  a  whole. 

The  program  is  then  reimplemented  in  the  target  language  based  on  this 
understanding.  The  key  to  this  appraoch  is  the  abstract  understanding 
obtained.  It  allows  the  translator  to  see  the  forest  for  the  trees  - 
benefiting  from  an  appreciation  of  the  global  features  of  the  source 
program  without  being  distracted  by  irrelevant  details.  _ _ 

Translation  via  abstraction  and  reimplementation  is  one  of  the  goals 
of  the  Programmer's  Apprentice  project.  A  translator  has  been  constructed 
which  translates  Cobol  programs  into  Hibol  (a  very  high  level,  business 
data  processing  language).  A  compiler  has  been  designed  which  generates 
extremely  efficient  PDP-11  object  code  for  Pascal  programs.  Currently, 
work  is  proceeding  toward  the  implementation  of  a  general  purpose, 
knowledge-based  translator. 
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Abstract 


Essentially  all  program  translators  (both  source-to-source  translators  and  com¬ 
pilers)  operate  via  transliteration  and  refinement.  The  source  program  is  first 
transliterated  into  the  target  language  on  a  statement  by  statement  basis.  Various 
refinements  are  then  applied  in  order  to  improve  the  quality  of  the  output.  Al¬ 
though  acceptable  in  many  situations,  this  approach  is  fundamentally  limited  in 
the  quality  of  the  output  it  can  produce.  In  particular,  it  tends  to  be  insufficiently 
sensitive  to  global  features  of  the  source  program  and  too  sensitive  to  irrelevant 
local  details. 


This  paper  presents  an  alternate  translation  paradigm — abstraction  and  reim¬ 
plementation.  Using  this  paradigm,  the  source  program  is  first  analyzed  in  order  to 
obtain  a  programming  language  independent,  abstract  understanding  of  the  com¬ 
putation  performed  by  the  program  as  a  whole.  The  program  is  then  reimplemcnted 
in  the  target  language  based  on  this  understanding.  The  key  to  this  approach  is  the 
abstract  understanding  obtained.  It  allows  the  translator  to  see  the  forest  for  the 
trees — benefiting  from  an  appreciation  of  the  global  features  of  the  source  program 
without  being  distracted  by  irrelevant  details. 

Translation  via  abstraction  and  reimplementation  is  one  of  the  goals  of  the  Pro¬ 
grammer’s  Apprentice  project.  A  translator  has  been  constructed  which  translates 
Cobol  programs  into  Ilibol  (a  very  high  level,  business  data  processing  language).  A 
compiler  has  been  designed  which  generates  extremely  efficient  PUP-1 1  object  code 
for  Pascal  programs.  Currently,  work  is  proceeding  toward  the  implementation  of 
a  general  put  pose,  knowledge-based  translator. 
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1  -  Introduction 

The  goal  of  this  paper  is  to  present  the  idea  of  translation  via  abstraction  and  reimpleincntation  and  compare  it 
with  the  standard  approach  of  translation  via  transliteration  and  refinement.  In  the  main,  this  is  done  through  a 
discussion  of  die  basic  ideas  behind  the  two  approaches  and  a  discussion  of  the  designs  for  two  translators  based 
on  abstraction  and  ^implementation.  In  addition,  the  paper  presents  a  detailed  description  of  an  implemented 
prototype  translator  which  demonstrates  the  efficacy  of  the  abstraction  and  reimpleincntation  approach. 

The  process  of  program  translation  takes  a  program  written  in  some  source  language  and  creates  an  equivalent 
program  in  some  target  language.  The  primary  goal  of  translation  is  to  create  a  syntactically  correct  program  in 
the  target  language  which  computes  the  same  thing  as  die  source  program  in  more  or  less  die  same  way.  For  a 
wide  variety  of  source  and  target  languages,  satisfying  this  goal  is  relatively  straightforward. 

In  addition  to  the  primary  goal  of  correctness,  translation  typically  has  one  or  more  subsidiary  goals  such  as 
efficiency  or  readability  of  die  target  program.  In  general,  die  most  difficult  aspect  of  translation  is  not  producing 
correct  output,  but  rather  attempting  to  satisfy  these  subsidiary  goals.  I'hc  main  problem  is  that  typically  die 
subsidiary  goals  of  translation  arc  at  best  orthogonal  to.  and  at  worst  in  conflict  with,  the  goals  ol  the  original 
audior  of  die  source  program. 

Translations  vary  widely  in  quality.  An  optimal  translation  would  produce  die  program  which  the  original 
authors  would  have  produced  had  they  been  writing  in  the  target  language  in  the  first  place  and  had  dicy  had  the 
desired  subsidiary  goals  in  mind. 

The  most  common  example  of  program  translation  is  compilation  —  die  translation  of  a  program  written  in  a 
high  level  language  into  machine  language.  In  compilation,  the  key  subsidiary  goal  is  achieving  efficiency  in  the 
target  program.  The  work  on  compilers  has  demonstrated  that  acceptable  efficiency  can  be  obtained.  However, 
there  is  still  a  long  way  to  go.  F.ven  die  best  optimizing  compilers  fall  short  of  the  efficiency  which  programmers 
can  achieve  writing  directly  in  machine  language. 

Another  important  application  of  program  translation  is  sourcc-to-sourcc  program  translation.  In  diis 
situation,  a  program  is  translated  from  a  language  which  may  be  in  some  way  obsolete  into  another  language 
where  it  can  be  more  easily  maintained.  In  sourcc-to-sourcc  translation,  die  key  subsidiary  goal  is  achieving 
readability  (and  lienee  maintainability)  of  the  target  program.  'Hie  use  of  automatic  translation  during 
maintenance  has  been  severely  limited  by  the  fact  that  readability  of  the  target  program  is  very  difficult  to  achieve. 

Most  current  program  translators  operate  by  a  process  which  could  be  called  translation  via  /mushier, won  and 
refinement.  In  diis  process,  the  source  program  is  first  transliterated  into  the  target  language  on  a  line  by  line  basis 
by  translating  each  line  in  isolation.  Various  refinements  are  then  applied  in  order  to  improve  the  target  program 
produced.  As  discussed  in  Seciion  II,  this  process  has  a  number  of  advantages.  However,  it  is  inherently  limited 
in  the  extent  to  which  it  can  satisfy  the  subsidiary  goals  of  translation.  In  particular,  tianslalion  v  ia  transliteration 
and  refinement  tends  to  he  insufficiently  sensitive  to  global  features  of  the  source  program  and  too  sensitive  to 
n  i  lev  at  local  doi.ii!'  of  the  .miice  proinam. 
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rcimplrmi iihilinn.  In  this  process,  the  source  program  is  fust  analyzed  in  order  to  obtain  an  abstract  description  of 
the  computation  being  performed,  flic  program  is  then  reimplemented  in  the  target  language  based  on  the 
abstract  description.  The  central  feature  of  this  approach  is  the  abstraction  step.  It  allows  the  translator  to  benefit 
from  a  global  understanding  of  what  the  source  program  docs.  In  addition,  the  abstraction  step  deliberately 
discards  information  about  details  of  the  source  program  which  are  not  relevant  to  the  translation  process. 
Although  inherently  more  complex  than  translation  via  transliteration  and  refinement,  translation  via  abstraction 
and  rcimplcmentation  is  capable  of  producing  very  high  quality  results. 

Sections  IV  &  V  present  examples  of  program  translators  which  operate  via  abstraction  and  rcimplcmentation. 
The  first  example  translator  (Saleh  [10])  is  a  prototype  system  which  translates  Cobol  programs  into  I  libol.  (Hibol 
is  a  very  high  level,  non-procedural,  business  data  processing  language.)  Satch  is  notable  because  it  produces 
extremely  readable  output.  The  second  example  (Cobbler  [9])  is  a  proposed  compiler  which  translates  Pascal 
programs  into  PI  )P- 1 1  assembler  language.  Cobbler  is  notable  because  it  produces  extremely  efficient  output. 

Section  VI  describes  efforts  within  the  Programmer's  Apprentice  project  [28]  toward  the  construction  of  a 
general  purpose,  knowledge-based  translation  system  operating  via  abstraction  and  rcimplcmentation.  In  order  to 
support  very  high  quality  translation,  this  system  will  have  extensive  knowledge  of  how  algorithms  can  be 
expressed  in  die  source  and  target  languages.  In  order  to  make  the  system  general  purpose,  this  knowledge  will  be 
represented  dcclarativcly  in  a  library  of  algorithm  schemas.  Hach  schema  will  specify  how  a  class  of  algorithms 
can  be  rendered  in  the  source  or  target  language. 

Section  VII  discusses  other  work  which  is  relevant  to  the  idea  of  translation  via  abstraction  and 
rcimplcmentation.  In  particular,  research  on  natural  language  translation  has  shown  that  obtaining  a  global 
understanding  of  the  source  text  is  essential  for  producing  high  quality  translations. 


(I  -  Translation  via  Transliteration  and  Refinement 

As  shown  in  Fig.  1,  translation  via  transliteration  and  refinement  operates  in  two  steps.  The  transliteration  step 
translates  the  source  program  on  an  element  by  element  basis.  (The  word  transliteration  (as  opposed  to 
translation)  is  used  to  connote  the  idea  of  literal  translation  w  here  each  clement  is  translated  in  isolation  without 
regard  for  context.)  The  output  of  the  transliteration  step  is  expressed  either  directly  in  the  target  language  or  in 
an  intermediate  language  which  is  semantically  similar  to  it. 


TRANSLITERATION 


target-like  intermediate 


REFINEMENT 


Fig.  1.  I  ranslation  via  transliteration  and  refinement. 


target  program 
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the  efficiency  of  the  code  produced.  If  the  intermedi.itc  language  is  not  identical  to  the  target  language,  men  the 
refinement  step  also  performs  the  (topically  trivial)  translation  from  intermediate  to  final  form. 

i'.xamplc  oj  I  rans!  iteration  mu!  Refinement 

As  an  example  of  translation  via  transliteration  and  refinement,  consider  how  this  approach  could  he  used  to 
translate  Fortran  [34|  programs  into  Ada  (37j  programs.  Fig.  2  shows  a  Fortran  program  BOUND  which  is  taken 
from  the  IBM  Fortran  Scientific  Subroutine  Package  |35|.  Fig.  3  shows  the  result  of  the  transliteration  step  of  the 
translation  process.  Fig.  4  shows  the  final  result  after  the  refinement  step  of  the  translation  process. 

The  program  BOUND  has  six  input  parameters  and  four  output  parameters.  The  parameter  A  is  a  matrix  which 
contains  a  set  of  observations  of  a  number  of  variables  presumably  determined  in  some  experiment.  The  integer 
parameters  NO  and  nv  specify  the  number  of  observations  and  the  number  of  variables  respectively.  (As  is 
generally  the  case  in  the  programs  in  the  Scientific  Subroutine  Package,  although  A  is  logically  a  matrix,  it  is 
declared  to  be  a  vector  and  all  of  the  index  computations  arc  explicit  in  the  program.) 

The  parameter  S  is  a  vector  of  length  NO.  The  vector  S  selects  the  observations  which  should  be  considered  by 
tiac  program  BOUND.  An  observation  J  is  considered  only  if  S  ( J )  is  non-zero. 

The  parameters  BLO  and  BHI  arc  vectors  of  length  NV.  For  each  variable,  these  vectors  specify  lower  and  upper 
bounds  respectively  for  the  observation  values.  The  integer  parameter  IER  is  used  to  return  an  error  code.  If 
blo(  I  )>BHI  ( I )  for  any  I  then  IER  is  set  to  one  and  computation  is  aborted:  otherwise  it  is  set  to  z.ero. 

The  parameters  UNDER.  BETW.  and  OVER  arc  also  vectors  of  length  NV.  For  each  variable  I,  the  program  BOUND 
counts  how  many  of  the  selected  observations  are  under  BLO(l),  how  many  arc  between  BLO(I)  and  BHl(l) 
inclusive,  and  how  many  arc  over  BHI(I).  These  counts  arc  stored  in  the  variables  under,  BETW.  and  OVER 
respectively  which  are  the  principal  outputs  of  the  program  BOUND. 


SUBROUTINE  BOUND( A . S . BLO . BHI . UNDE R , BE TW , OVER . NO , NV , I ER ) 
DIMENSION  A(l),S(l), BLO( 1 ) , BHI ( 1 ) , UNDER ( 1 ) , BETW(  l).OVER(l) 
IER  =  0 

DO  10  I  >  1.  NV 

IF  (BLO( I)-BHI(I) )  10,10,11 

11  IER  =  1 
GO  TO  12 

10  CONTINUE 

DO  1  K  =  1.  NV 
UNDER(K)  =  0.0 
BETW(K)  =  0.0 

1  OVER(K)  =  0.0 
DO  8  J  =  1.  NO 
IJ  =  J-NO 

IF  (S( J) )  2,8,2 

2  DO  7  I  =  1,  NV 
IJ  =  IJ+NO 

IF  (A(IJ)-BLO(I))  5,3,3 

3  IF  (A(IJ)-BHI(I))  4,4,6 

4  BETW(I)  =  BETW{ I )  +  l  .0 
GO  TO  7 

5  UNDER(I)  =  UNDER( I )+l . 0 
GO  TO  7 

6  OVER(I)  =  OVER( I )+l . 0 

7  CONTINUE 

8  CONTINUE 

12  RETURN 
END 

Fig.  2.  The  Fortran  program  BOUND. 


The  transliteration  process  is  illustrated  by  Fig.  3.  Hach  part  of  the  program  is  translated  locally.  The  Fortran 
parameters  arc  all  turned  into  "in  out"  parameters  of  appropriate  types  in  die  Ada  program.  They  arc  gi\cn  the 
mode  "in  out"  because  every  Fortran  parameter  can  potentially  be  both  an  input  value  and  an  output  value.  The 
Fortran  assignment  statements  are  converted  into  equivalent  Ada  assignments.  This  requires  very  little  change 
because  Fortran  is  essentially  a  subset  of  Ada  when  it  comes  to  arithmetic  expressions  and  assignment  statements. 
Fortran  aridimetic  IFs  arc  expanded  into  equivalent  Ada  "if  then  else"  statements  branching  to  die 
appropriate  labels.  Arithmetic  IFs  where  two  of  the  labels  arc  the  same  arc  treated  as  special  eases  in  order  to 
avoid  the  need  for  temporary  variables.  Kach  Fortran  DO  is  expanded  into  an  equivalent  Ada  "loop".  The  Ada 
"for"  construct  cannot  be  used  because  Ada  "for"  tests  for  termination  at  the  top  of  the  loop  while  Fortran  DO 
tests  for  termination  at  the  bottom  of  the  loop.  Fortran  CONTINUE.  RETURN  and  GO  TO  arc  turned  into  Ada 
"nul  1".  "return",  and  "goto"  respectively.  I  Tie  only  aspect  of  die  transliteration  which  is  not  totally  local  is  that 
die  Fortran  program  has  to  be  scanned  in  order  to  determine  what  variables  are  used  in  the  program  so  that 
appropriate  variable  declarations  can  be  inserted  at  the  beginning  of  the  Ada  program. 
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type  VECTOR  is  array  ( INI  EGER  range  <>)  of  REAL; 

procedure  B0UNl)( A. S . Bl 0 . BHI .UNDER .BETU. OVER  ;  in  out  VECTOR; 
NO.NV.IER:  in  out  IN  I  EGER  )  is 
I.IJ.J.K;  INTEGER; 

begin 

IER  :=  0; 

I  :=  1; 

1  oop ; 

if  BL0(r}-BHI(I )<=0.0  then  goto  L10; 
else  goto  L 1 1 ; 
end  if; 

<<L11>>  IER  :=  1; 

goto  L 12 ; 

<<L10>>  nul 1 ; 

I  :=  1+1; 
exit  when  I>NV; 
end  loop; 

K  :=  1; 
loop 

UNDER(K)  :=  0.0; 

BETW(K)  :=  0.0; 

<<L1>>  OVER(K)  :=  0.0; 

K  :=  K+l ; 
exit  when  K>NV; 
end  loop; 

J  :=  1; 
loop 

IJ  ;=  J-NO; 

if  S(J)=0.0  then  goto  L8; 
else  goto  L2; 
end  if; 

<<L2>>  I  :=  1; 

1  oop 

IJ  :=  IJ+N0; 

if  A( IJ)-BL0( I )<0 .0  then  goto  L5; 
else  goto  L3; 
end  if; 

<<L3>>  if  A( I J ) -BHI ( I ) <=0 . 0  then  goto  L4; 
else  goto  L6; 
end  if ; 

<<L4>>  BETW(I)  :=  BETW(l)+1.0; 
goto  L7; 

<<L5>>  UNDER(I)  :=  UNDER( I )  +  l  .0  ; 
goto  L7; 

<<L6>>  OVER(I)  ;=  0VER( I ) +1 . 0 ; 

<<L7>>  null ; 

I  :=  1  +  1 ; 
exit  when  I>NV; 
end  loop; 

<<L8>>  null; 

J  :■=  J+l ; 
exit  when  J>N0; 
end  loop; 

< < L 1 2 >>  return; 
end  BOUND; 


Fid.  3.  A  transliteration  of  Fig  2  into  Ada. 
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As  is  typically  die  ease  with  transliteration,  the  program  in  Fig.  3.  although  correct,  docs  not  do  a  good  job  of 
satisfying  the  subsidiary  goals  of  translation  (in  this  case  readability).  Fig.  4  shows  the  final  result  after  the 
refinement  step  of  the  translation  process. 


type  VECTOR  is  array  (INTEGER  range  <>)  of  REAL; 

procedure  BOUND ( A , S , BLO , BHI :  VECTOR; 

UNDER, BETW, OVER;  in  out  VECTOR; 
NO.NV;  INTEGER;  IER ;  out  INTEGER)  is 
I.IJ.J.K:  INTEGER; 

beg  i  n 

IER  :=  0; 

I  :=  1; 
loop 

if  BLO( l)-BHl( I )<=0 .0  then  goto  L10;  end  If; 
IER  :=  1; 
return ; 

<<L10>>  I  ;=  1  +  1 ; 

exit  when  I>NV; 
end  loop; 

K  :=  1; 

1  oop ; 

UNDER ( K )  :=  0.0; 

BETW(K)  :=  0.0; 

OVER(K)  :=  0.0; 

K  :=  K+l ; 
exit  when  K>NV; 
end  loop; 

J  :=  1 
loop ; 

IJ  :=  J-NO; 

if  S(J)=0.0  then  goto  L8 ;  end  if; 

I  :=  1; 
loop ; 

IJ  :=  IJ+NO; 

if  A ( I J ) ~BL0( I)<0 . 0  then  goto  L5:  end  if; 
if  A( I J ) -BHI ( I ) >0 . 0  then  goto  L6;  end  If; 
BE  TW( I )  ;=  BETW( I )  +  l , 0 ; 
goto  L7; 

<<L5>>  UNDER(I)  ;=  UNDER( I )+l .0 ; 
goto  L7 ; 

<<L6>>  OVER(I)  :=  OVER(I)+1.0; 

<<L7>>  I  :=  1+1; 

exit  when  I>NV; 
end  loop; 

<<L8>>  J  :=  J+l ; 

exit  when  J>N0; 
end  loop; 
end  BOUND; 

F  ig.  4.  A  refined  transliteration  of  Fig  2  into  Ada. 


Fig.  4  is  derived  from  Fig.  3  by  applying  a  number  of  correctness-preserving  transformations.  Complex  J 

"if  then  else"  statements  which  have  clauses  which  branch  to  the  next  statement  arc  simplified  to  remove  these  \V  ^ 
clauses.  I  he  branch  to  a  "return"  statement  is  replaced  by  a  "return"  statement.  Unnecessary  "null" 
statements,  "return"  sluicmcnls.  and  labels  aic  removed.  Instead  ol‘-'i\ii.g  all  the  puramelci*.  the  mode  .j 

"in  out",  some  o|  die  parameter-  arc  given  jusl  die  mode  "out"  oi  "  in"  dhe  «!•  laiill  in  Ada).  I  las  is  don;  in  a  ^ 
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purely  syntactic  way  by  noting  that  parameters  which  arc  never  assigned  to  cannot  be  "out"  and  parameters 
which  arc  never  read  cannot  be  "in”. 

There  are  a  number  of  transformations  which  could  in  principle  have  been  applied  to  die  program  which  have 
not  been.  For  example,  the  computation  involving  UNDER,  BE TW,  and  OVER  could  be  rearrange  into  one  large 
"if  then  else".  However,  in  keeping  with  the  kinds  of  refinements  typically  supported  by  source-to-sourcc 
translators  (see  Section  VII),  two  criteria  were  used  in  order  to  decide  which  refinements  to  perform,  f  irst,  no 
support  was  provided  for  transformations  which  require  either  control  flow  or  data  flow  analysis  of  the  program. 
This  rules  out  transformations  like  the  one  suggested  above. 

Second,  die  main  emphasis  was  placed  on  transformations  which  only  look  at  an  adjacent  pair  of  statements. 

1  lie  only  transformation  which  is  more  complicated  than  this  is  the  one  which  refines  the  mode  of  die  parameters. 
This  transformation  has  to  scan  the  program  in  order  to  determine  which  parameters  arc  read  and  assigned. 
However,  it  docs  not  do  an  actual  data  flow  analysis.  If  it  did,  it  would  reali/c  that  UNDER.  BETW.  and  OVER  arc 
actually  "out"  parameters  and  not  "in  out"  parameters  since  they  cannot  be  read  until  alter  they  have  been 
assigned. 

Fig.  4  is  readable,  but  still  not  as  good  as  one  would  like.  In  particular,  it  falls  far  short  of  die  goal  of 
producing  the  program  die  programmers  would  have  produced  had  dicy  been  writing  in  Ada  —  it  is  a 
Fortran-style  Ada  program  instead  of  an  Ada-style  Ada  program.  As  will  be  discussed  in  Section  111,  better 
translations  of  F'ig.  2  can  be  achieved  by  means  of  translation  via  abstraction  and  rcimplcincntation. 

Figs.  3  &  4  arc  not  die  output  of  any  particular  translator.  Rather,  they  arc  hypothetical  examples  intended  to 
illustrate  the  process  of  transliteration  and  refinement.  However,  it  is  not  clear  that  any  existing  source-to-sourcc 
translator  produces  output  which  is  significantly  better  Uian  Fig.  4  (see  Section  Vll). 

Advantages  of  Transliteration  and  Refinement 

Translation  via  transliteration  and  refinement  has  several  advantages.  Most  importantly,  it  uses  a  divide  and 
conquer  strategy  in  order  to  satisfy  the  goals  of  translation.  The  basic  goal  of  obtaining  a  correct  translation  is 
achieved  by  die  transliteration  step.  The  refinement  step  need  only  guarantee  that  it  preserves  this  correctness. 
The  subsidiary  goals  of  the  translation  (c.g..  efficiency  or  readability)  arc  achieved  by  die  refinement  step.  Hie 
transliteration  step  is  greatly  simplified  by  not  having  to  worry  about  the  subsidiary  goals. 

Another  advantage  is  that  the  localized  nature  of  the  transliteration  step  makes  il  easy  to  encode  the  basic 
knowledge  needed  for  translation.  This  knowledge  is  economically  represented  by  suiting  how  each  of  the 
constructs  in  the  source  language  should  be  converted  into  equivalent  constructs  in  the  target  language.  The 
transliteration  step  need  not  have  any  knowledge  about  how  special  combinations  of  source  constructs  can  be 
represented  as  special  combinations  of  target  constructs.  CITiis  latter  kind  of  knowledge  is  the  province  of  the 
refinement  slop  which  presumably  knows  how  to  fine  tunc  cumbersome  combinations  of  target  constructs.) 

\  Tim.i1  advantage  of  transl.iiion  via  tr.mditci  ation  and  refinement  is  that  it  makes  it  easy  to  construct  families 
■  a.  a  ,  a!i  i  ■ !,  ., .  .a.  a.-  :i.ir  .lr,ei.  "i>;:  a.-p  i>i  •  li  ne  me  s.::.r.  leiiuemei;;  -.(vp.  I  oi  example,  one 
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might  construct  a  family  of  compilers  which  compile  various  high  level  languages  mho  the  same  machine  language 
and  which  share  die  same  refinement  step. 

Transliteration  Is  Not  Always  Practical 

Although  it  works  satisfactorily  in  many  situations,  translation  \ia  transliteration  and  refinement  has  some 
fundamental  disadvantages.  To  begin  with,  it  assumes  that  transliteration  is  practical.  I  his  in  turn  depends  on  the 
assumption  that  each  of  the  source  language  constructs  can  be  individually  translated  into  target  language 
constructs  in  a  practical  way.  Unfortunately,  this  is  not  always  the  ease. 

The  main  way  in  which  transliteration  can  be  blocked  is  that  the  source  language  may  support  a  primitive 
construct  which  is  not  supported  by  the  target  language.  For  example,  consider  translating  from  a  language  which 
supports  GOTOs  into  a  language  which  docs  not,  or  from  a  language  which  supports  multiple  assignments  to  a 
variable  into  a  functional  language  which  docs  not.  In  the  ease  of  Fortran  and  Ada,  consider  the  fact  that  Ada  has 
nothing  which  is  equivalent  to  die  Fortran  EQUIVALENCE  statement. 

The  primary  source  of  incompleteness  in  current  translators  is  primitive  constructs  which  cannot  be 
transliterated.  Current  translators  typically  just  ignore  non-translitcratablc  constructs,  either  refusing  to  process 
source  programs  which  contain  them  or  copying  them  unchanged  from  die  source  to  the  target.  Human 
intervention  is  required  either  to  remove  diem  from  the  source  or  to  fix  them  up  in  the  target. 

A  second  way  in  which  transliteration  can  be  blocked  is  that  die  source  and  target  languages  ma;  nave 
constructs  which,  although  they  correspond  closely,  differ  in  significant  semantic  details.  Most  of  the  mne  these 
details  may  not  matter  for  translation.  However,  when  they  matter  they  arc  liable  to  matter  a  lot.  For  example, 
consider  translating  into  a  language  which  forces  complex  data  structures  to  be  copied  when  dicy  arc  assigned  to  a 
variable  from  a  language  which  docs  not,  or  between  languages  which  differ  in  their  variable  scoping  rules.  In  the 
ease  of  Fortran  and  Ada,  consider  the  fact  diat  vector  arguments  to  Fortran  subroutines  arc  passed  by  reference 
while  Ada  specifies  that  it  is  undefined  whether  or  not  vector  arguments  will  be  copied  or  passed  by  reference. 

Ihe  primary  source  of  incorrectness  in  current  translators  is  constructs  which  can  be  transliterated 
straightforwardly  most  of  the  dme  but  only  with  great  difficultly  (or  not  at  all)  in  certain  hard-to-dctcct  situations. 
Current  translators  typically  just  use  the  straightforward  transliteration  all  of  the  time  without  giving  any 
indication  that  there  might  be  a  problem.  (For  example,  die  transliteration  in  Fig.  3  blindly  assumes  diat  it  docs 
not  matter  how  the  vector  parameters  get  passed.)  Human  intervention  is  required  in  order  to  correct  any 
problems  which  arise  in  the  target  program  produced. 

Transliteration  Complicates  Refinement 

A  second  fundamental  disadvantage  of  translation  via  transliteration  and  refinement  is  an  unintended 
byproduct  of  its  greatest  advantage.  Ihe  principal  virtue  of  ihe  transliteration  and  refinement  approach  is  that  it 
simplifies  the  problem  of  satisfying  the  primary  goal  of  translation  (i.c..  correctness)  by  Incloi  mg  out  the  problem 
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satisfying  the  subsidiary  goals.  Ibis  is  particularly  unfortunate  since  the  subsidiary  goals  arc  usually  harder  to 
satisfy  than  the  primary  goal. 

The  basic  reason  why  translation  via  transliteration  and  refinement  complicates  the  task  of  satisfying  the 
subsidiary  goals  of  translation  is  that  typically  the  process  of  transliteration  docs  not  merely  ignore  the  subsidiary 
goals,  it  works  against  them.  Simply  put,  whether  or  not  the  original  source  program  is  giW  from  the  point  of 
view  of  the  subsidiary  goals  of  the  translation,  the  output  of  the  transliteration  step  is  almost  alway  s  guaranteed  to 
be  bad  from  this  point  of  view. 

The  most  obvious  way  in  which  transliteration  makes  things  difficult  for  later  refinement  is  that,  more  often 
than  not,  the  transliteration  of  a  given  construct  in  the  source  language  requires  the  use  of  a  circumlocution  in  the 
target  language.  The  only  time  when  this  can  be  completely  avoided  is  when  the  target  language  possess  a 
semantically  identical  construct.  Kxamplcs  of  both  of  these  eases  can  be  seen  in  Tig.  3.  The  00  loops  in  die 
Fortran  program  arc  converted  into  cumbersome  "loop"  statements  in  the  Ada  program.  In  contrast,  the 
assignment  statements  remain  essentially  unchanged. 

A  more  subtle  way  in  which  transliteration  makes  tilings  difficult  for  later  refinement  is  that  it  tends  to  obscure 
the  key  features  of  the  algorithm  implemented  by  the  program  being  translated.  Transliteration  docs  this  through 
both  camouflage  and  the  creation  of  decoys.  The  mass  of  circumlocutions  produced  by  transliteration  act  as 
camouflage  hiding  the  key  features.  Decoys  (features  which  arc  prominent  but  actually  unimportant)  arc  created 
because  tire  code  produced  is  sensitive  to  unimportant  details  of  the  source.  For  example.  Fig.  3  would  have 
looked  quite  different  if  the  Fortran  programmer  had  used  logical  Its  instead  of  arithmetic  IFs.  A  kind  of 
indirect  camouflage  is  produced  due  to  die  fact  that  the  transliteration  step  is  insensitive  to  global  considerations. 
Transliteration  typically  renders  a  given  construct  in  exactly  the  same  way  even  if  the  context  would  suggest  that  it 
should  be  translated  differently.  For  example,  all  of  the  parameters  arc  given  the  mode  "in  out”  in  Fig.  3 
whether  or  not  tJiis  is  actually  necessary  given  the  way  they  arc  used. 

A  final  way  in  which  transliteration  makes  things  difficult  for  later  refinement  is  diat  useful  information  about 
die  source  program  can  get  lost.  As  an  example  of  this,  consider  translating  from  a  language  (such  as  Ada)  where 
die  order  of  evaluation  of  die  arguments  of  a  function  call  is  undefined  to  a  language  where  the  order  is  defined. 
In  diis  situation,  straightforward  transliteration  will  define  an  evaluation  order  and  thereby  discard  the 
information  that  many  evaluation  orders  arc  equally  acceptable.  Tills  loss  of  information  makes  it  hard  for  die 
refinement  step  to  apply  transformations  which  are  not  applicable  to  the  chosen  evaluation  order  but  which  are 
applicable  to  one  of  the  evaluation  orders  which  was  not  chosen. 

Applicability  of Trans! Hera  Hon  ami  Refinement 

The  primary  requirement  for  the  applicability  of  translation  via  transliteration  and  refinement  is  that 
transliteration  must  be  practical.  Tor  this  to  be  the  ease,  the  target  language  must  support  all  of  the  primitive 
constructs  supported  hv  the  source  language.  In  general,  this  implies  that  the  target  language  must  be  at  a  lower 
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I  r.mslalion  via  iransliteration  and  icl'memem  is  peril. ips  nn»i  applixahk  m  t  unipil.iinm  kx  nixc  am  pnmiliv  c 
consiruel  can  be  expressed  in  machine  language.  In  unv.iast  muirc-O'-souicc  t  ansktors  tv  pic. illy  have  lo  tesii icl 
die  input  language  and/or  admit  possibly  incorrect  translations  m  mdei  to  malu  tianslitcration  practical. 

A  second  limitation  on  the  applicability  of  translation  vi.i  iransliteiation  and  icl'memem  is  that  refinement  is  an 
inherently  difficult  task  which  transliteration  makes  more  difficult.  As  a  result,  ihe  transliteration  and  refinement 
approach  is  most  applicable  in  situations  where  the  subsidiary  goals  of  translation  are  not  too  stringent. 

Transliteration  and  refinement  works  well  in  a  straightforward  compiler  where  readability  ol  the  output  is  not 
an  issue  and  only  moderate  efficiency  is  required  in  the  output  code.  In  order  to  achieve  significantly  higher 
levels  of  efficiency  in  the  output  code,  optimizing  compilers  expend  an  enormous  amount  of  effort  on  refinement. 


Ill  -  Translation  via  Abstraction  and  Reimplementation 


As  shown  in  Tig.  5,  translation  via  abstraction  and  rcimplcmentation  operates  in  two  steps.  The  abstraction 
step  performs  a  global  analysis  of  die  source  program.  I  he  goal  of  this  analysis  is  to  obtain  an  understanding  of 
die  algorithms  being  used  by  die  program.  I  he  abstract  description  highlights  the  essential  features  of  these 
algoriduns  while  deliberately  throwing  away  information  about  unimportant  features  of  the  program. 


abstract  description 


ABSTRACTION 


REIMPLEMENTATION 


Tig.  5.  Translation  via  abstraction  and  rcimplcmcniation. 


The  rcimplcmentation  step  takes  the  abstract  description  produced  by  the  abstraction  step  and  creates  a 
program  in  the  target  language  which  implements  this  description.  In  order  to  simplify  this  disk,  die  abstract 
description  is  designed  so  that  it  contains  exactly  die  right  kind  of  information  needed  in  order  to  guide  the 
rcimplcmentation  process. 


1  he  basic  difference  between  translation  via  transliteration  and  refinement  and  translation  via  abstraction  and 
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rcimplcmentation  can  be  seen  by  comparing  the  shapes  of  Tigs.  1  &  5.  The  transliteration  and  refinement 
approach  translates  directly  to  die  target  language.  In  contrast,  die  abstraction  and  rcimplcmentation  approach 
fust  translates  the  source  program  up  to  a  very  high  level  description  and  then  tianslatcs  this  description  down  to 
the  target  language. 

l  ike  translation  via  transliteration  and  refinement,  translation  via  abstraction  and  rcirnplemcmaiion  uses  a 
divide  and  conquer  strategy  to  attack  the  translation  task.  However,  il  divides  the  translation  task  differently.  I  lie 
ti.uiditcr.ilion  and  refinement  approach  separates  the  problem  of  satisfying  the  primary  goal  of  translation  from 
ill,'  problem  of  s.itisf vine,  the  subsidiary  goals  of  u. nidation.  In  control,  the  a'v  (r.r  lion  and  ivimnli'rvnt.itioti 
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I  xitmplc  ttf  Abstraction  and  lu  iniph mentation 

As  iin  example  of  translation  via  abstraction  and  reimplementation,  consider  how  this  approach  could  be  used 
to  translate  the  Fortran  program  in  Fig.  2  into  Ada.  The  first  step  is  to  obtain  an  abstract  description  of  the 
computation  in  l-'ig.  2.  Fig.  6  shows  the  key  elements  of  such  an  abstract  description. 

Fig.  6  is  divided  into  three  parts.  The  first  part  lists  tire  parameters  of  the  program  BOUND  and  their  types  as 
specified  in  the  original  Fortran  program.  ( lly  convention,  the  Scientific  Subroutine  Package  uses  die  dimension 
specification  V(l)  to  specify  a  vector  of  unknown  length  rather  than  a  vector  of  length  one.)  A  complete  data 
flow  analysis  of  the  program  is  used  in  order  to  determine  which  parameters  arc  "in"  and  which  arc  "out".  This 
analysis  reveals  that  UNDER,  BETW,  and  OVER  arc  never  read  before  they  are  written  and  arc  therefore  "out" 
parameters. 

The  second  part  of  Fig.  6  lists  a  number  of  constraints  which  must  be  satisfied  in  order  for  the  program  BOUND 
to  produce  reasonable  results.  I'hc  first  seven  constraints  state  that  die  ranges  of  the  various  vector  parameters 
must  be  large  enough  to  prevent  referencing  memory  locations  outside  of  the  vectors.  These  constraints  are 
determined  by  looking  at  the  largest  values  which  the  various  index  variables  in  the  program  can  reach. 

The  last  two  constraints  specify  that  the  parameters  NO  and  NV  must  be  positive  and  therefore  that  die  vector 
parameters  must  have  positive  extent.  These  arc  particularly  interesting  constraints  because  dicy  imply  that  Ada 
"for"  loops  can  be  used  when  translating  the  program.  The  constraints  follow  from  the  observation  dial  a 
Fortran  DO  loop  which  enumerates  the  elements  of  an  array  docs  not  operate  correctly  when  given  an  array  of  zero 
extent.  The  problem  is  that  die  body  of  a  Fortran  DO  loop  is  always  executed  at  least  once,  even  if  the  limits 
placed  on  the  DO  variable  suggest  that  zero  executions  would  be  more  appropriate.  (This  feature  of  DO  is 
occasionally  used  in  a  constructive  way  by  Fortran  DO  loops  which  do  not  enumerate  the  elements  of  arrays.) 

The  third  part  of  Fig.  6  describes  the  computation  performed  by  the  program.  The  first  two  lines  specify  that 
die  program  checks  to  see  dial  every  element  of  BLO  is  less  than  or  equal  to  die  corresponding  element  of  BHI.  If 
this  is  true  then  IER  is  set  to  zero.  Otherwise,  IER  is  set  to  one  and  the  program  is  terminated. 

The  remainder  of  Fig.  6  describes  the  main  computation  performed  by  die  program  BOUND  in  terms  of 
recurrence  equations.  I'hc  main  body  of  the  program  is  a  doubly  nested  loop  iterating  over  the  index  variables  J 
and  I.  The  various  evaluations  of  the  body  of  the  inner  loop  can  be  icferred  to  in  terms  of  die  corresponding 
values  of  the  index  variables.  The  notation  is  used  to  refer  to  the  value  of  the  variable  X  at  die  end  of  the 
evaluation  of  die  inner  loop  body  during  which  die  outer  loop  index  has  die  value  m  and  the  inner  loop  index  has 
the  value  n.  The  recurrence  equations  specify  how  variable  values  corresponding  to  a  given  evaluation  of  the 
inner  loop  body  arc  computed  from  values  corresponding  to  earlier  evaluations.  I  he  recurrence  equations  arc 
derived  by  inspecting  the  data  flow  in  the  loops  As  part  of  this  process,  die  middle  loop  in  the  Fortran  code  is 
revealed  to  be  part  of  the  initialization  for  the  main  loop  in  the  program. 

The  fact  that  Fig.  h  is  shown  in  a  textual  form  is  not  intended  to  implv  that  the  abstract  description  would 
■  n  1 1  lull  v  be  lepies.  iiled  lexlu.il1 .  I  <  >i  example  il  uipli!  t  ike  I  lie  l<  >i  ill  ol  ii  k  ul  expo  -ions  annol.it  me  a  dull 
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PARAMETERS: 

in  A.S.BLO.BHI:  vector  of  real 
out  UNDER ,BETW, OVER :  vector  of  real 
in  NO.NV:  integer 
out  IER:  Integer 
CONSTRAINTS: 

A ' RANGED1 .  .  NV»NO ,  S ’ RANGED1 . . NO . 

BLO ' RANGE31 .  .NV,  BHI ' RANGED1 . .NV, 

UNDER ' RANGED1 .  .  NV ,  BETW ' RANGE31 .  .  NV.  OVER ’ RANGED1 . . NV . 

NO>  1 ,  NV>  1 
COMPUTATION: 

if  (V  IC1..NV  BL0(  I ) <BHI ( I ) )  then  IER  =  0 
else  I E R  =  1  A  computation  is  aborted 
The  main  computation  is  a  doubly  nested  loop 

The  outer  index  (first  subscript)  counts  from  1  to  NO 
The  inner  index  (second  subscript)  counts  from  1  to  NV 
The  variables  assigned  within  the  loops  have  the  following  values: 
V  j€ 1 . .NO,  iCl. .NV,  K€1 . .NV 
IJj0=j-NO 
IJj  j=IJj j-i+NO 
UNDER ( K )Q  j  =  0.0 

if  K  =  i  A  S(j)*0.0  A  A(  IJj  j)<BLO(i) 
then  UNDER(K)j  j  =  1.0+UNDER(K)j.lti 
else  UNDER(K)j’j=UNDER(K)j.ltj 
BETW(K)oi*0.0 

if  K  =  i  A  S(j)*0.0  A  BL0(i)<A(IJj  j)<BHI(i) 
then  BE  rV/(  K )j  j=  1.0+BETW(K  )j-|J 
else  BETW(K)j’i=BETW(K)j.1j 
OVER(K)fl  j  =  0.0 

if  K  =  i  A  S(j)*0.0  A  BHI(i)<A(IJjj) 
then  OVF.R( K)j  j=1.0+0VER(K)j.j  j 
else  OVER(K)j’j=OVER(K)j.1  j 


Fig.  6.  An  abstract  description  of  Fig.  2. 


Based  on  the  abstract  description  in  Fig.  6,  it  is  a  straightforward  matter  to  create  a  quality  translation  of  the 
program  bound  into  Ada  as  shown  in  Fig.  7.  The  parameters  are  made  parameters  in  the  code  with  the  specified 
types.  The  recurrence  equations  map  directly  into  a  triply  nested  loop.  Transformations  similar  to  those  used  by 
an  optimizing  compiler  can  be  used  to  get  rid  of  the  unnecessary  innermost  loop  over  K  and  to  move  the  test 
S(  J  )/  =  o  .  0  out  to  the  outermost  loop  since  it  is  an  invariant  in  the  inner  loop. 

A  comparison  of  Fig.  7  with  Fig.  4  shows  that  the  translation  in  Fig.  7  is  superior  in  several  respects.  Most 
notably,  the  parameters  have  all  been  given  the  correct  modes:  labels  and  "goto"  statements  have  been 
eliminated  in  favor  of  complex  "if  then  el  se"  statements;  and  "for"  loops  have  been  used. 

Some  of  the  improvements  which  arc  seen  in  Fig.  7  could  have  been  achieved  in  Fig.  4  if  local  refinement  had 
been  applied  more  aggressively.  For  example,  local  transformations  probably  could  have  been  used  to  combine 
die  simple  "if  t.henelse"  statements  in  l  ig.  4  with  the  st.iiements  following  them  in  oidei  to  ovale  the 
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"if  then  else"  statements  shown  in  Fig.  7. 

However,  improvements  such  as  determining  the  proper  modes  for  the  parameters  and  utili/ing  "for"  loops 
depend  critically  on  an  understanding  of  the  program  as  a  whole.  These  changes  cannot  he  made  until  after  a 
global  analysis  of  the  program  has  determined  that  the  changes  arc  valid. 


type  VECTOR  is  array  (INTEGER  range  <>)  of  REAL; 

procedure  BOUND (A , S  , BLO , BHI :  VECTOR; 

UNDER, BETW, OVER:  out  VECTOR; 

NO.NV:  INTEGER;  IER :  out  INTEGER)  is 
I.IJ.J.K:  INTEGER; 
begin 

IER  :=  0; 

for  I  in  1..NV  loop 

if  8LO( I ) >BHI( I )  then  IER  :=  1;  return;  end  if; 
end  loop; 

for  K  in  1 .  .  NV  loop 
UNDER(K)  :=  0.0: 

BETW(K)  :  =  0.0; 

OVER(K)  :  =  0.0; 
end  loop; 

for  J  in  1 . .NO  loop 
if  S(J)/*0.0  then 
IJ  :  =  J-NO; 
for  I  in  1..NV  loop 
IJ  :=  IJ+NO; 

if  A( I J ) <BL0( I )  then  UNDER(l)  :  =  UNDER(l)+1.0; 
elsif  6HI(r)<A(IJ)  then  OVER(l)  :*  0VER(l)+1.0; 
else  BETW(I)  :*  8ETW(I)+1.0; 
end  If; 
end  loop; 
end  If; 
end  loop; 
end  BOUND; 

Kig.  7.  A  translation  of  Kig  2  into  Ada  based  on  Fig.  6. 


While  Fig.  7  is  a  good  translation  of  Fig.  2  into  Ada,  it  is  still  far  from  optimal.  Appropriate  Ada-style 
constructs  have  been  used,  however,  the  result  is  still  essentially  a  Fortran-style  program.  In  particular,  the  fact 
that  A  is  really  a  matrix,  but  is  declared  to  be  a  vector  and  the  fact  that  the  various  vector  parameters  may  have 
ranges  which  are  larger  than  the  ranges  indicated  by  the  parameters  NO  and  NV  is  in  the  style  of  the  Fortran 
Scientific  Subroutine  Package,  but,  it  is  not  in  the  style  of  Ada. 

Fig.  7  is  shown  as  it  is  because  it  is  just  about  the  best  translation  which  can  be  achieved  if  the  parameters  and 
their  types  arc  required  to  remain  the  same  as  in  the  Fortran  program.  In  addition,  it  illustrates  the  kind  of 
translation  which  can  be  achieved  by  using  an  abstract  representation  which  is  only  moderately  abstract. 


Example  of  Increased  Abstraction 

Figs.  8  &  9  show  a  translation  of  the  program  BOUND  into  Ada  which  is  better  than  the  one  shown  in  Fig.  7  and 
the  abstract  description  on  whic  h  it  is  based.  There  arc  two  fundamental  ways  in  which  the  translation  shown  in 
tie,  ,e  n-aiivs  is  dillvreni  limn  the  one  shown  in  I  ms.  (i  T  : .  l  ust.  I  igs.  8  &  ')  assume  that  the  program  BOUND  and 
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the  programs  which  call  it  arc  being  translated  together.  This  opens  up  two  new  avenues  of  attack  on  the  °  * 
translation  problem.  The  programs  which  call  BOUND  can  be  inspected  in  order  to  obtain  additional  information 
about  BOUND.  The  interface  to  the  program  BOUND  can  be  altered  in  order  to  render  die  program  more 
acsdictically  in  Ada. 

In  Fig.  8  it  is  assumed  that  an  analysis  of  the  programs  which  call  BOUND  shows  dial  BOUND  is  only  called  with 
vectors  which  have  the  exact  sizes  indicated  by  the  parameters  NO  and  NV.  This  makes  it  possible  to  tighten  up  the 
constraints  in  the  description  and  to  eliminate  all  mention  of  the  variables  NO  and  NV  in  favor  of  using  the  Ada 
array  attribute  "  ’  RANGE"  applied  to  die  parameters. 

The  second  fundamental  difference  between  Figs.  8&9  and  Figs.  6&7  is  that  Fig.  8  is  significantly  more 
abstract  than  Fig.  6.  Ihc  computations  being  performed  arc  described  in  terms  of  their  net  effects.  The 
computations  involving  UNDER,  BE  TW,  and  OVER  are  described  as  computing  a  count  of  elements  of  A  which  have 
certain  properties.  The  variable  S  is  described  as  a  vector  of  flags  which  arc  tested.  A  is  described  directly  as  a 
matrix,  and  no  mention  is  made  of  the  variable  I J.  The  computation  involving  IER  is  summarized  by  stating  diat 
the  computation  is  aborted  and  an  error  signalled  if  the  first  constraint  is  violated.  No  mention  is  made  of  how 
this  might  be  done. 

LOGICAL  INPUTS:  >‘V 

A  matrix  of  real  • 

S  vector  of  flag 

BLO.BHI  vector  of  real 
LOGICAL  OUTPUTS: 

UNDER, BETW. OVER  vector  of  count 

error  signaled  (and  computation  aborted)  if  constraint  (1)  is  violated 
CONSTRAINTS: 

(1)  V  I€BL0 1  RANGE  BL0( I ) < BHI ( I ) 

(2)  A ’ RANGE ( I )  =  BL0 1 RANGE-BHI ’ RANGE*UNDER  * RANGE  =  BETW ' RANGE=0VER ’ RANGE 

(3)  A ' RANGE ( 2 ) *S  ' RANGE 

COMPUTATION: 

V  ICUNDER' RANGE 

UNDER(I)  *  count-of  {J€S ' RANGE  |  S(J)  A  A(  I ,  J)  <BLO(  I )} 

V  ICBETW ' RANGE 

BETW(I)  =  count-of  {J€S  ’RANGE  |  S(J)  A  BLO(  I  )<A(I ,  J)<BHI(I )} 

V  ICOVER' RANGE 

OVER(I)  =  count-of  {J€S ' RANGE  |  S(  J )  A  BHI  ( I ) <A( I , J)} 

Fig.  8.  A  more  abstract  description  of  Fig.  2. 

Ihc  key  to  the  increase  in  abstraction  in  Fig.  8  is  the  ability  to  recognize  the  net  effects  of  a  computation  This 
in  turn  depends  on  the  abstraction  component  having  a  significant  amount  of  knowledge  about  what  kinds  of 
compulations  can  be  performed.  For  example,  it  can  presumably  recognize  that  the  recurrence  equations  in  Fig.  6 
compute  counts  and  that  the  computation  involving  the  variable  IJ  converts  matrix  indices  to  vector  indices. 
Similarly,  it  can  recognize  that  the  computation  involving  the  variable  IER  reflects  the  standard  way  that  error 
conditions  arc  signalled  in  the  Fortran  Scientific  Subroutine  Library. 

lie, ...I  on  I  io.  S.  Ilu-  reimplement. iiion  step  ran  produce  a  much  heller  pine  ram  Fee  lie  9)  than  die  one 
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shown  in  Fig.  7  because  it  has  fewer  restrictions  placed  on  it.  it  can  choose  better  parameters  and  better  types 
because  the  abstract  description  docs  not  require  that  the  parameters  and  types  he  the  same  as  in  the  fortran 
program.  It  is  free  to  implement  the  error  signalling  using  standard  Ada  methods  —  i.c..  by  raising  an  exception 
instead  of  returning  an  error  value  which  has  to  be  explicitly  checked  by  the  caller.  Due  to  the  stronger 
constraints  on  the  length  of  the  vectors,  array  literals  can  be  used  to  initialize  die  vectors  UNDER,  BE TW,  and  OVER 
instead  of  a  loop. 

In  some  situations,  the  added  freedom  docs  not  cause  any  change  in  the  translation.  For  example,  the 
rcimplcmcntation  step  could  have  computed  the  counts  in  several  different  ways.  However,  none  of  these 
mcihods  would  have  been  any  better  than  the  one  shown  in  Fig.  7.  so  the  same  method  was  used  in  l-'ig.  9. 

There  is  a  price  which  has  to  be  payed  in  order  to  get  the  improved  translation  shown  in  Fig.  9.  Analysis  is 
made  more  complicated  by  the  need  to  recognize  the  net  effects  of  die  computation  being  performed.  In 
addition,  rcimplcmcntation  is  made  more  complicated  because  there  are  more  implementation  decisions  which 
have  to  be  made. 


type  VECTOR  is  array  (INTEGER  range  <>)  of  REAL; 

type  BOOLS  is  array  (INTEGER  range  <>)  of  BOOLEAN; 

type  VECT  is  array  (INTEGER  range  <>)  of  INTEGER; 

type  MATRIX  is  array  (INTEGER  range  <>.  INTEGER  range  <>)  of  REAL; 

procedure  BOUND(A :  MATRIX;  S:  BOOLS;  BLO.BHI :  VECTOR; 

UNDER. BETW, OVER:  out  VECT)  is 
I.J:  INTEGER; 
begin 

for  I  in  BLO ’ RANGE  loop 

if  BLO( I )>BHI( I )  then  raise  CONSTRAINT_ERROR ;  end  If; 
end  loop; 

UNDER  (UNDER ’ RANGE  *>  0); 

BETW  (BETWRANGE  =>  0); 

OVER  :=  ( OVER  1  RANGE  *>  0); 
for  J  in  A ' RANGE ( 2 )  loop 
if  S(J)  then 

for  I  in  A ' RANGE ( 1 )  loop 

if  A( I , j)<BL0( I )  then  UNDER(l)  :=  UNDER(I)+1; 
elsif  BHI ( I )<A( I.J)  then  OVER(I)  :  *  0VER(I)+1; 
else  BETW(l)  :*  BETW(l)+l; 
end  if ; 
end  loop; 
end  if; 
end  loop; 
end  BOUND; 

Fig.  9.  A  translation  of  Fig  2  into  Ada  based  on  Fig.  8. 


Figs.  6-9  arc  not  produced  by  any  particular  translator.  Rather,  they  arc  hypodietical  examples  intended  to 
illustrate  the  process  of  abstraction  and  rcimplcmcntation.  In  particular,  they  demonstrate  that  increased 
abstraction  leads  to  improved  translation.  In  the  limit  it  is  possible  to  create  a  translation  which  compares 
favorably  with  the  program  the  programmers  would  have  written  had  they  been  writing  in  the  target  language. 
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Adiantages  of  Abstraction  ami  Rcimplrmcnlalion 

The  most  important  advantage  of  translation  via  abstraction  and  reim piemen tation  is  that,  while  translation  via 
transliteration  and  refinement  is.  in  essence,  designed  to  facilitate  achieving  the  primary  goal  of  translation  (i.e„ 
correctness),  translation  via  abstraction  and  rcimplementalion  is  specifically  designed  to  facilitate  achieving  the 
subsidiary  goals  of  translation.  As  discussed  in  Section  II,  transliteration  creates  many  problems  for  later 
refinement.  In  contrast,  die  sole  purpose  of  abstraction  is  to  simplify  later  rcimplementalion.  Sections  IV  &  V 
give  extended  examples  of  die  way  in  which  abstraction  and  ^implementation  can  cooperate  in  order  to  produce 
high  quality  translation. 

A  second  important  advantage  of  translation  via  abstraction  and  rcimplementalion  is  that  it  is  not  limited  by 
the  practicality  of  transliteration.  As  discussed  in  Section  II,  the  local  nature  of  transliteration  can  cause  it  to  be 
blocked  even  though  overall  translation  is  possible.  In  contrast,  there  is  no  a  priori  reason  for  abstraction  to  ever 
be  blocked  since  die  result  of  abstraction  is  not  constrained  by  die  target  language.  Furdicr,  reimplementation 
need  not  be  blocked  as  long  as  overall  translation  is  possible. 

A  final  virtue  of  translation  via  abstraction  and  rcimplementalion  is  diat  it  lends  itself  to  the  construction  of 
families  of  translators  which  share  components  at  least  as  well  as  translation  via  transliteration  and  refinement  if 


not  better.  In  this  regard,  note  dial  designing  an  abstract  representation  which  is  compatible  widi  a  diverse  set  of 


target  languages  is  easier  dien  designing  a  target-like  intermediate  language  which  is  compatible  with  diem. 
Disadvantages  of  Abstraction  and  Rcimplementalion 


l  ike  translation  via  transliteration  and  refinement,  translation  via  abstraction  and  rcimplementalion  has  a 
fundamental  problem  of  incompleteness.  Unlike  transliteration,  abstraction  and  rcimplementalion  arc  always 
possible  as  long  as  translation  is  possible.  However,  it  would  not  be  reasonable  to  assume  that  diese  processes  will 
always  be  practical.  When  they  arc  not.  a  translator  will  have  to  fall  back  on  some  other  method  of  translation. 
For  example,  it  might  use  transliteration  (or  ask  for  human  assistance)  in  order  to  translate  those  parts  of  a 
program  which  could  not  be  usefully  abstracted  and/or  rcimplcmcntcd. 

A  key  issue  then  is  what  percentage  of  a  typical  source  program  can  be  practically  abstracted  and 
^implemented.  This  question  can  only  be  answered  in  die  context  of  a  particular  application.  However,  two 
general  statements  can  be  made.  First,  any  parucului  deficiency  in  abstraction  or  rcimplementalion  can  be 
rectified  by  adding  more  knowledge  into  die  abstraction  and  rcimplementalion  modules.  Second,  die  limits  of 
abstraction  and  rcimplementalion  are  essentially  orthogonal  to  the  limits  of  transliteration.  I  hercforc,  a  translator 
which  uses  abstraction  and  rcimplementalion  and  which  falls  back  on  transliteration  should  always  be  more 
complete  dian  one  which  uses  transliteration  alone. 

Another  disadvantage  of  the  abstraction  and  reimplementation  approach  is  that  it  is  more  complicated  than 
transliteration  and  refinement.  All  in  all.  in  situations  where  iiunslitcraiion  is  practical  and  little  refinement  is 
nc'.uv  ay.  Uan'Tition  via  tiaia.liier.itioii  and  t'dinein  ail  is  pt.Tuhiv  (Iw  .ipo.iMch  of  ihoiee.  Ilnwe  er  in 
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translation  v iii  abstraction  and  rciinplcmcntation  can  succeed  in  producing  high  quality  output  where  translation 
via  transliteration  and  refinement  would  fail. 


IV  -  Stitch  —  Translating  from  Cobol  to  Hibol 

Faust's  Satch  system  [10]  uses  abstraction  and  reimpiementation  in  order  to  attack  a  problem  which  is 
particularly  difficult  for  translation  by  transliteration  and  refinement  —  translation  from  a  low  level  programming 
language  to  a  high  level  programming  language.  There  arc  two  key  problems  with  this  kind  of  translation.  First, 
transliteration  is  usually  not  practical.  Second,  the  subsidiary  goal  of  such  a  translation  is  readability  which  is  an 
exceptionally  difficult  goal  to  satisfy  well. 

In  the  case  of  Satch.  the  source  language  is  Cobol  [36]  and  the  target  language  is  Hibol  [20].  The  motivation 
behind  the  translation  performed  by  Satch  is  the  desire  to  convert  pre-existing  Cobol  programs  into  a  form  where 
they  can  be  more  easily  maintained.  The  benefits  of  the  translation  are  illustrated  by  the  fact  tli.it  the  resulting 
I  libol  program  can  be  as  much  as  an  order  of  magnitude  shorter  than  the  original  Cobol  program. 

Ilibol  is  a  special  purpose  business  data  processing  language.  It  is  a  very  high  level,  non-procedural,  single 
assignment  language  which  is  based  on  the  concept  of  a  flow.  A  flow  is  a  multidimensional  aggregate  of  data 
values  which  arc  indexed  by  one  or  more  keys.  Hach  Hibol  statement  specifics  how  a  flow  is  computed  from  other 
flows.  This  is  done  by  specifying  how  a  typical  clement  of  tlic  output  How  is  computed  from  typical  elements  of 
the  input  flows.  An  important  advantage  of  Hibol  is  that  both  file  I/O  and  iteration  over  the  elements  of  flows  is 
implicit  in  a  Ilibol  program  and  therefore  docs  not  have  to  be  explicitly  specified  by  the  programmer.  Fig.  11 
(which  will  be  discussed  below)  shows  an  example  of  a  Hibol  program. 

A  key  aspect  of  the  non-procedf.ral  nature  of  Hibol  is  that  there  is  no  explicit  control  flow  in  a  Hibol  program. 
The  statements  in  a  Hibol  program  arc  unordcrcd  and  there  arc  no  flow  of  control  constructs  such  as  conditionals 
or  loops.  As  a  result  of  tins,  direct  transliteration  from  a  programming  language  such  as  Cobol  which  has  flow  of 
control  constructs  to  Hibol  is  not  practical. 


Example  of  Saleh's  Translation 

Figs.  10  &  11  (adapted  from  ll()|)  show  an  example  of  a  translation  performed  by  Satch.  Fig.  10  shows  a  Cobol 
program  named  PAYROLL.  This  program  reads  in  a  file  of  records  which  specify  the  wage  rate  for  each  member  of 
a  group  of  employees.  I  he  program  computes  the  gross  pay  for  each  employee  based  on  a  40  hour  week  along 
w  ith  a  count  of  the  employees  and  the  total  gross  pay  for  all  the  employees. 
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ENVIRONMENT  DIVISION. 

CONE IGURATION  SECTION. 

INPUT-OUTPUI  SECTION. 

FILE-CONTROL. 

SELECT  HOURLY-WAGE-IN  ASSIGN  TO  DA-2301 -S-HWI . 

SELECT  GROSS-PAY-OUT  ASSIGN  TO  DA- 2301-S-GPO . 

SELECT  EMPLOYEE-COUNT-OUT  ASSIGN  TO  DA-2301-S-ECO. 
SELECT  TOTAL-GROSS-PAY-OUT  ASSIGN  TO  DA-2301-S-TGPO. 
DATA  DIVISION. 

FILE  SECTION. 

FD  hourly-wage-in 

LABEL  RECORD  IS  OMITTED 


DATA  RECORD  IS  hour  1 y-wage-rec . 
01  hourly-wage-rec. 

02  employee- number 
02  hourly-wage 
FD  gross-pay-out 

LABEL  RECORD  IS  OMITTED 
DATA  RECORD  IS  gross-pay-rec . 

01  gross-pay-rec. 

02  employee-number 
02  gross-pay 
FD  employee-count-out 

LABEL  RECORD  IS  OMITTED 


PICTURE  IS  9(9). 
PICTURE  IS  999V99 . 


PICTURE  IS  9(9). 
PICTURE  IS  999V99 . 


DATA  RECORD  IS  emp 1 oyee-coun t-rec  . 

01  employee-count-rec. 

02  employee-count  PICTURE  IS  9(6). 

FD  total -gross-pay-out 
LABEL  RECORD  IS  OMITTEO 
DATA  RECORD  IS  total -gross-pay-rec . 

01  total -gross-pay-rec . 

02  total -gross-pay  PICTURE  IS  9(7)V99. 

PROCEDURE  DIVISION, 
initialization  SECTION. 

MOVE  ZERO  TO  total -gross-pay . 

MOVE  ZERO  TO  employee-count. 

OPEN  INPUT  hourly-wage-in. 

OPEN  OUTPUT  gross-pay-out. 
mai n 1 Ine  SECTION . 

READ  hourly-wage-in  AT  END  GO  TO  end-of-job. 

MOVE  employee-number  OF  hourly-wage-rec 
TO  employee-number  OF  gross-pay-rec. 

MULTIPLY  hourly-wage  BY  40  GIVING  gross-pay. 

ADD  1  TO  employee-count. 

ADD  gross-pay  TO  total -gross-pay . 

WRITE  gross-pay-rec. 

GO  TO  mainl ine. 


end-of-job  SECTION. 

CLOSE  hourly-wage-in. 

CLOSE  gross-pay-out. 

OPEN  OUTPUT  employee-count-out. 
WRITE  employee-count-rec. 

CLOSE  employee-count-out. 

OPEN  OUTPUT  total -gross-pay-out . 
WRITE  total -gross-pay-rec . 

CLOSE  tota I -gross-pay-out . 

STOP  RUN. 


|-ip..  ID  I  he  Cobul  pmr.r.im  PAYilOi  L. 
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|;ig.  11  shows  the  llihol  translation  which  is  produced  hv  Satch.  l  ike  any  llihol  program,  this  program  is 
Jivided  into  two  parts  which  arc  closely  analogous  to  the  parts  of  a  Cohol  program.  The  data  div  ision  of  the  I  lihol 
program  specifics  the  data  types  of  the  (lows  (introduced  by  the  keyword  FILE)  used  in  the  program  and  how 
these  Hows  arc  indexed.  The  computation  division  specifics  how  die  output  flows  arc  computed  from  the  input 
flows.  The  first  line  of  the  computation  division  specifics  that  the  elements  of  die  flow  GROSS-PAY  arc  computed 
by  multiplying  the  elements  of  die  flow  HOURLY-WAGE  by  40.  The  second  line  of  the  computation  division 
specifics  how  to  compute  die  single  element  flow  TOTAL-GROSS-PAY.  flic  operator  SUM  collapses  a  dimension  of 
a  flow  by  adding  all  of  the  elements  in  that  dimension  together.  In  an  analogous  way,  die  third  line  of  the 
computation  division  specifics  how  to  count  the  number  of  employees. 


DATA  DIVISION 
KEY  SECTION 

KEY  EMPLOYEE-NUMBER  FIELO  TYPE  IS  NUMBER  FIELD  LENGTH  IS  9 
INPUT  SECTION 

FILE  HOURLY-WAGE  KEY  IS  EMPLOYEE-NUMBER 
OUTPUT  SECTION 

FILE  GROSS-PAY  KEY  IS  EMPLOYEE-NUMBER 
FILE  EMPLOYEE-COUNT 
FILE  TOTAL-GROSS-PAY 
COMPUTATION  DIVISION 

GROSS-PAY  IS  (HOURLY-WAGE  •  40.) 

TOTAL-GROSS-PAY  IS  (SUM  OF  (HOURLY-WAGE  •  40.)) 

EMPLOYEE-COUNT  IS  (COUNT  OF  HOURLY-WAGE) 

Fig.  11.  Satch's  translation  of  Fig.  10  into  Hibol. 


Without  discussing  Figs.  10  &  1 1  in  any  more  detail,  it  can  be  seen  that  Satch  is  capable  of  creating  quite  good 
Hibol  translations  of  Cobol  programs.  (More  complex  examples  arc  given  in  [10].)  However,  the  translations 
produced  by  Satch  arc  still  not  optimal.  For  example,  it  would  be  better  if  Satch  were  capable  of  realizing  dial  the 
flow  TOTAL -GROSS -PAY  in  Fig.  11  could  be  computed  using  the  more  compact  expression  (SUM  OF  GROSS- PAY). 


Implementation  of  Satch 

1  ike  the  architecture  of  any  translation  system  based  on  abstraction  and  rcimplcmcntauon.  Satch's  architecture 
is  divided  into  two  basic  parts  (see  Fig.  12).  The  five  modules  on  die  left  side  of  the  figure  operate  together  to 
create  an  abstract  description  of  the  Cobol  program  supplied  to  Satch.  The  Hibol  reimplementation  module 
creates  a  I  lihol  program  based  on  the  abstract  description.  Most  of  the  burden  of  die  translation  is  carried  by  the 
abstraction  modules.  This  asymmetry  is  due  to  the  fact  that  the  very  high  level  nature  of  Hibol  allows  the  abstract 
description  to  be  similar  to  the  Lirget  language. 
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Fig.  12.  The  architecture  of  Satch. 

Hie  parsing  module  (implemented  by  G.  Burke)  parses  the  Cobol  program  and  transliterates  it  into 
pseudo-lisp.  (Lisp  [24]  was  chosen  as  the  output  of  this  module  in  order  to  facilitate  the  use  of  a  pre-existing  plan 
creation  module.)  The  parsing  module  is  implemented  in  essentially  the  same  way  that  the  transliteration 
component  of  a  Cobol  to  Lisp  translator  operating  via  transliteration  and  refinement  would  be  implemented. 

For  each  file  in  the  Cobol  program,  the  key  determination  module  determines  which  of  the  fields  of  the  file  act 
as  keys.  Various  heuristics  could  be  used  to  determine  this  information  by  looking  at  the  Cobol  program. 
However.  Saleh  currently  asks  the  user  to  specify  which  fields  arc  key  fields.  In  ordinary  use  this  would  not  lead 
to  an  excessive  amount  of  user  interaction  because  key  determination  only  has  to  be  done  once  for  each  file  even 
if  a  large  number  of  programs  which  operate  on  the  files  are  being  translated. 

The  plan  creation  module  converts  the  pscudo-1. isp  output  of  the  parsing  module  into  a  programming 
language  independent  internal  representation  called  a  surface  plan.  Fig.  13  (adapted  from  [10])  shows  a  simplified 
version  of  the  surface  plan  which  Satch  creates  when  operating  on  the  Cobol  program  PAYROLL  shown  in  Fig.  10. 

A  plan  is  similar  to  a  data  flow  diagram.  Computations  are  represented  by  boxes  (called  segments).  Ihc 
segments  arc  connected  by  solid  arrows  indicating  data  flow  and  dashed  arrows  indicating  control  flow.  In  the 
figure,  many  of  the  data  flow  arrows  have  annotations  indicating  the  variables  they  correspond  fo.  T  he  names  of 
die  segments  represent  the  operations  they  perform.  PLUS  adds  two  numbers.  CREAD  reads  a  record  from  a  file. 
EOFP  determines  whether  the  end  of  a  file  has  been  reached.  PIF  splits  control  (low  based  on  whether  or  not  ns 
input  is  TRUE. 

In  the  interest  of  brevity,  die  plan  in  Fig.  13  has  been  simplified  in  several  ways.  Ihc  computation  of 
EMPLOYEE -COUNT  has  been  omitted.  The  file  open  and  close  functions  have  been  removed.  Except  for  the  file 
MOURl  Y-WAfiE,  the  data  flow  corresponding  to  the  various  file  objects  has  been  omitted.  Die  data  flow  fin  the  file 
ootntl  >'  -WAfit  was  ict. lined  in  order  to  mate  du-  \  Of  i>  i,  ■  i  uinlei-a.ind.ihle. 


into  ,!  hierarchy  of  segments  within  segments  in  ordn  iu  highlight  the  logical  structure  oi  the  plan.  Second.  the 
limps  m  the  plan  are  identified  and  broken  down  into  then  eomponeiu  parts. 

fig.  14  shows  a  simplified  pumped  plan  lor  the  prognni  PAYROLl..  I  ike  l  ie.  1.1  the  grouped  plan  omits  the 
file  open  and  close  functions  and  some  of  the  other  file  operations.  I  he  figure  is  also  simplified  in  that  it  does  not 
show  the  computation  which  occurs  within  die  carious  segments.  Unlike  f  ig.  13  the  grouped  plan  shows  the 
computation  of  EMPLOYEE-COUNT. 

I  he  key  difference  between  f  igs  13  &  14  is  the  way  the  loop  in  die  progiam  PAYROLL  is  represented.  In 
fig,  14.  the  various  parts  of  the  loop  are  broken  apart  into  segments  whieli  arc  connect  by  data  flow  rather  than 
control  How.  I  his  is  done  through  a  process  called  temporal  abstraction  [27], 

temporal  abstraction  treats  series  of  values  m  the  loop  (e.g..  die  successive  values  of  HOURLY-WAGE)  as  if  they 
were  angle  data  objects.  These  temporal  senes  are  represented  by  bold  data  flow  arrows  in  fig.  14.  Temporal 
abstraction  analyzes  a  loop  as  a  set  of  generators  and  consumers  which  are  sources  and  sinks  for  temporal  series, 
foi  example,  in  fig.  14.  the  generator  CREAD  creates  a  temporal  series  of  HOURLY-WAGE  values  which  arc 
consumed  by  the  segment  TIM£S(  40)  I  Ins  segment  m  turn  creates  a  temporal  scries  of  GROSS-PAY  values  which 
are  summed  up  by  the  segment  PLUS(SUM). 


TEMPORAL  COMPOSITION 


CONSUMER 

CWRITE 


GENERATOR  EN 
CREAD 


CONSUMER 
T 1ME  S{ 40  ) 


CONSUMER 

PLUS 

(COUNT) 


CONSUMER 

PLUS 

(SUM) 


fig.  14  A  simplified  grouped  plan  for  PAYROLL. 

Ns  ills  iissed  III  detail  ill  |?'|  die  pi  dm  .s  nl  teinu,  a  .4  .ih  ,u  .nit'll  is  bis  d  oil  die  data  How  m  a  loop 


understood  in  isolation. 

Saleh  was  implemented  in  the  context  of  the  Trogr.unincr's  Apprentice  project  ami  n  shaies  nunj  ideas  with 
the  rest  of  die  project.  In  particular,  the  plan  representation,  the  plan  creation  module,  and  the  grouping  module 
are  borrowed  directly  from  Kill unacs [28]  which  is  the  current  demonstration  system  developed  as  part  of  the 
Programmer's  Apprentice  project. 

I  he  algorithm  identification  module  inspects  the  grouped  plan  and  determines  the  net  effect  of  the 
computation  being  performed.  In  combination  with  the  results  of  key  determination,  the  results  of  algorithm 
identification  form  an  abstract  description  of  the  program.  Fig.  15  (adapted  fromllOj)  shows  the  abstract 
description  which  is  created  for  the  program  PAYROLL.  The  first  part  of  Fig.  15  comes  directly  from  the  data 
division  of  the  Cobol  program  annotated  by  die  key  determination  module.  The  second  part  of  Tig.  15  comes 
from  algorithm  identification. 

Algorithm  identification  operates  in  two  stages.  The  first  stage  identifies  what  kinds  of  looping  computations 
arc  present  in  die  program.  This  is  done  by  special  purpose  procedures  which  scan  the  grouped  plan  and 
recogni/e  standard  kinds  of  computation.  In  Tig.  14.  these  recognition  procedures  identify  that  die  segments 
CREAG  and  E Of  P  enumerate  the  records  in  a  file  while  the  segment  CWRIT  •.  -it mutates  a  series  of  records  into  a 
file.  I  hey  also  identify  dial  the  segment  PLUS( SUM)  computes  a  sum  while  the  tegment  PLUS (  COUNT )  computes 
a  count.  (  I  lie  names  of  these  segments  in  Tig.  14  reflect  the  fact  that  this  recognition  litis  been  performed.)  The 
recognition  sLige  of  the  algorithm  identification  module  makes  it  possible  to  use  the  terms  "enumerate”,  "sum”, 
and  "count"  in  the  abstract  description  to  describe  die  computation  in  die  loop  instead  of  recurrence  equations. 

file  second  stage  of  algorithm  identification  computes  summary  descriptions  of  the  computation  performed  by 
die  program.  This  is  done  by  means  of  a  symbolic  evaluator  which  traverses  the  plan  and  accumulates  algebraic 
equations  which  describe  the  computation.  Tor  example,  the  symbolic  evaluator  determines  that  the  field 
GROSS-PAY  has  the  value  "CREAD-VALUE ( HOURLY-WAGE- IN ,  HOURLY-WAGE ) *40  —  i.c.,  forty  times  die  value 

of  die  HOURLY-WAGE  field  read  from  the  file  HOURLY-WAGE  -  IN.  Similarly,  it  determines  diat  die  field 
TOTAL -GROSS-pay  accumulates  the  sum  of  the  GROSS-PAY  values.  An  algebraic  simplifier  is  used  in  order  to 
render  the  equations  in  as  compact  a  form  as  possible. 


FILES: 

HOURLY-WAGE-IN 

key-field  EMPLOYEE-NUMBER- IN  9(9) 
data-field  HOURLY-WAGE  999V99 
GROSS-PAY-OUT 

key-field  EMPLOYEE-NUMBER-OUT  9(9) 
data-field  GROSS-PAY  999V99 
EMPLOYEE-COUNT-OUT 

data-field  EMPLOYEE-COUNT  9(6) 

TOTAL-GROSS-PAY-OUT 

data-field  TOTAL-GROSS-PAY  9(7)V99 

COMPUTATION: 

The  main  loop  in  the  program  enumerates  the  records  in  the  file 
HOURLY-WAGE-IN.  It  terminates  when  EOF P( HOURLY-WAGE - 1 N ) . 
fields  written  on  each  cycle  of  the  main  loop: 

EMPLOYEE-NUMBFR-OUT  =  CRE AO- VALUE ( HOURLY -WAGE - l N .  EMPLOYEE  -  NUMBER- IN ) 
GROSS-PAY  =  CRE AD-VALUE ( HOURLY -WAGE -  IN  .  HOURLY- WAGE ) *40 . 
fields  written  after  the  main  loop: 

EMPLOYEE-COUNT  =  coun t ( NOT ( EOF P ( HOURLY -WAGE  -  IN )) ) 

TOTAL-GROSS-PAY  =  sum(CREAD-VALUE ( HOURLY-WAGE - IN .  HOURLY-WAGE ) »40 .  ) 
Fig.  15.  An  abstract  description  of  PAYROLL. 


The  rcimplemcntation  module  of  Satch  produces  a  Hibol  program  based  on  the  abstract  description  of  the 
Cobol  program.  This  is  done  by  converting  these  equations  into  Hibol  syntax.  The  only  real  complexity  in  this  is 
checking  that  the  program  is  expressible  in  Hibol.  In  particular,  the  rcimplemcntation  module  lias  to  check  that 
each  input  file  is  processed  in  Hill  and  that  the  input  keys  map  to  the  output  keys  in  a  way  which  is  compatible 
with  the  implicit  file  reading  and  writing  performed  by  Hibol. 


l.itmis  of  Saleh 

Although  it  illustrates  the  efficacy  of  translation  based  on  abstraction  and  rcimplemcntation,  there  arc  several 
w  ays  in  which  Satch  is  limited.  First  of  all.  Satcii  is  only  a  demonstration  system.  It  has  only  been  tested  on  a  few 
examples  and  therefore  has  not  been  fully  debugged.  In  addition,  it  is  quite  slow. 

A  more  fundamental  problem  with  Satch  is  that  it  is  only  applicable  to  a  narrow  class  of  Cobol  programs.  Part 
of  this  is  due  to  the  fact  that,  since  Hibol  is  a  relatively  special  purpose  language,  many  Cobol  programs  cannot  be 
reasonably  translated  into  Hibol  by  any  means.  However,  there  arc  many  Cobol  programs  which  could  in 
principle  be  translated  into  Hibol  in  a  reasonable  way  which  cannot  be  translated  by  Satch.  The  basic  difficulty  is 
that  Satch  does  not  have  a  generalized  recognition  facility.  Rather,  special  purpose  procedures  have  to  be  written 
in  older  for  Satch  to  be  able  to  identify  what  kinds  of  looping  computations  arc  present  in  a  program. 
Overcoming  this  difficulty  is  a  primary  goal  of  the  knowledge-based  translation  system  discussed  in  Section  VI. 


sy 
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V  -  Cobbler  —  Translating  from  Pascal  to  Assembler  Language 

Dultcys  proposed  Cobbler  system  (9|  uses  translation  via  abstraction  and  reimplemeniation  in  order  to 
compile  Pascal  [l.l|  progiants  into  PDP-11  assembler  language  [33].  Cobbler's  goal  is  the  creation  of  extremely 
efficient  object  code  code  w  Inch  is  comparable  in  efficiency  to  the  code  which  could  be  produced  bv  an  expert 
assembly  language  programmer,  litis  is  a  level  of  efficiency  which  is  beyond  any  existing  compiler  and  is 
arguably  beyond  die  abilities  of  any  translator  based  on  transliteration  and  refinement. 

At  first  glance,  it  may  seem  surprising  that  Cobbler  and  Satch  use  the  same  approach  to  translation.  After  all. 
the  problems  associated  with  compiling  Pascal  do  not  seem  to  be  very  similar  to  the  problems  associated  with 
translating  Cobol  to  Hibol.  In  particular,  the  goal  of  the  former  is  efficiency  of  low  level  output  w  hile  the  goal  of 
the  latter  is  readability  of  high  level  output. 

However,  the  two  kinds  of  translation  actually  have  a  great  deal  in  common.  Stated  generally,  the  key  problem 
both  systems  face  is  that  the  quality  criteria  which  govern  the  source  arc  very  different  from  the  quality  criteria 
which  govern  die  target.  In  order  to  have  die  freedom  to  do  a  good  job  of  satisfying  die  target  criteria,  die  source 
must  be  analyzed  and  restated  in  an  abstract  way  which  frees  it  from  die  constraints  of  die  source  criteria. 

Example  of  Cobbler's  Compilation 

Figs.  16  &  17  (adapted  from  [9])  show  an  example  of  how  Cobbler  is  intended  to  operate.  Fig.  16  shows  a 
Pascal  program  which  initializes  a  4x4  array  A  of  bytes  to  the  identity  matrix.  The  program  does  this  a  column  at  a 
time  by  setting  each  column  element  to  zero  and  then  changing  the  diagonal  element  to  one.  Fig.  17  shows  the 
PDP-11  .assembler  code  w  hich  would  be  produced  by  Cobbler. 

var  I:  1..4;  0:  1. .4; 

A:  arrayCl . . 4 ,  1..4]  of  0..255; 

begin 

for  J  :=  1  to  4  do 
begin 

for  I  :=  1  to  4  do  A[I,J]  ;«  0; 

A[J,J]  :=  t 
end 
end 

Fig.  16.  The  Pascal  program  INITIALIZE. 


MOV  #A,R3 
MOV  #3 , R0 
LI:  MOVB  #\ , ( R3 )+ 
CLRB  (R3)+ 
CLRB  (R3  )  + 
CLRB  (R3)  + 
CLRB  (R3  )  + 

DEC  R0 

Sf?  BGI  LI 

MOVB  #  1 .  (  R 3  ) 


Fig.  17.  Cubblcr's  compilation  of  Fig.  16. 
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I  hc  code  in  Fig.  17  is  much  more  efficient  than  a  simple  literal  translation  of  Fig.  16  into  PDI’T  I  assembler. 
The  optimizations  introduced  can  be  divided  into  two  categories:  algorithm  independent  optimizations  and 
changes  to  the  algorithm. 

The  algorithm  independent  optimizations  arc  improvements  which  any  good  optimizing  compiler  might  make. 
The  inner  loop  is  unrolled  in  order  to  eliminate  die  overhead  engendered  by  having  a  loop.  I  hc  matrix  A  is 
operated  on  as  a  one  dimensional  vector  in  order  to  simplify  address  calculations.  The  outer  loop  is  controlled  by 
an  auxiliary  counter  (fiO)  which  counts  down  instead  of  up.  T  his  allows  die  code  to  take  advantage  of  die  fact  that, 
on  the  PDP-11,  comparison  with  zero  is  more  efficient  dian  comparison  with  other  numbers.  (After  each 
arithmetic  operation,  condition  codes  arc  automatically  set  which  specify  whether  die  result  is  greater  dian,  equal 
to,  or  less  Ilian  zero.) 

For  the  most  part,  the  optimizations  above  arc  straightforward.  The  first  simply  involves  duplicating  die  inner 
loop  body,  and  die  second  is  essentially  a  strength  reduction.  However,  introducing  an  auxiliary  loop  counter  is 
somewhat  more  complex.  If  a  loop  counts  from  n  up  to  m  by  s.  Then  a  new  loop  counter  can  be  introduced 
which  counts  from  m-n/s  down  to  zero  by  one.  Computation  of  the  old  counter  is  retained  so  that  it  can  be  used 
w  ithin  die  loop  while  the  new  counter  is  used  to  control  the  loop.  (In  Fig.  17  no  trace  of  this  computation  remains 
because  the  simplification  of  the  addressing  calculations  has  rendered  it  unnecessary.)  The  correctness  of  diis 
transformation  is  supported  by  the  fact  that  Pascal  prohibits  the  body  of  a  "for"  loop  from  modifying  the 
iteration  variable  or  the  bounds  of  the  iteration. 

In  order  to  highlight  the  algorithmic  changes  introduced  by  Cobbler,  Fig.  18  shows  a  decompilation  of  Fig.  17 
which  undoes  the  effects  of  the  algorithm  independent  optimizations  discussed  above  while  leaving  the 
algorithmic  changes  in  place.  It  should  be  noted  that  the  figure  is  merely  intended  as  a  presentational  device. 
There  arc  a  number  of  reasons  why  Fig.  18  is  not  a  valid  Pascal  program.  (Most  notably,  the  matrix  A  is  declared 
to  have  different  bounds  from  those  which  are  presumably  associated  with  other  uses  of  the  matrix.) 


var  I:  1..3;  J:  2. .5; 

A:  array[t..3,  1..6]  of  0..255; 

begin 

for  I  1  to  3  do 
begin 

Afl.l]  :*  0; 

for  J  :*  2  to  5  do  A[I,J]  :  =  0 
end ; 

A[4,l]  :«  1 
end 

Fig.  18.  A  decompilation  of  Fig.  17  into  pseudo- Pascal. 


Comparison  of  Fig.  16  with  Fig.  18  shows  that  the  computation  performed  by  the  target  code  produced  by 
Cobbler  is  startlingly  different  from  the  computation  performed  by  the  source  code.  In  fact,  it  is  probably  not 
nppropiiatc  to  say  that  the  two  pieces  of  code  are  using  the  same  algorithm. 

I  hi..,  ili'iiihmic  changes  ha.e  been  inti  t  ■  I  need.  Ih  ■  t.ug.ei  code  avoids  i  slundanilv  setiinu  the  di.gonal 


elements  to  zero  before  setting  them  to  one.  The  target  operates  on  A  in  row  major  order  rather  than  column 
major  order.  The  target  treats  A  logically  as  a  rectangular  3x5  matrix  plus  one  additional  element  instead  of  as  a 
square  4x4  matrix. 

Perhaps  die  most  important  difference  is  the  switch  to  row  major  order.  For  whatever  reason,  die  programmer 
chose  to  use  column  major  order  in  Fig.  16.  This  choice  clashes  with  die  fact  that  Pascal  stores  arrays  in  row  major 
order.  Switching  to  row  major  order  changes  the  program  so  diat  it  references  the  elements  of  A  in  memory 
storage  order.  This  in  turn  makes  it  possible  to  use  auto-increment  mode  PDP-11  instructions  to  support  die 
address  calculations  required. 

Undoubtedly  die  most  surprising  change  is  the  switch  to  operating  on  A  as  a  3x5  matrix.  This  makes  it  much 
easier  to  set  the  appropriate  elements  of  A  to  one  since  all  diese  elements  arc  now  in  die  same  column. 

As  will  be  discussed  in  the  next  subsection.  Cobbler  is  able  to  make  die  algorithmic  changes  outlined  above 
because  it  creates  an  abstract  description  of  the  program  which  is  not  constrained  by  the  order  of  iteration  in  the 
loops,  or  even  by  the  fact  diat  A  is  declared  to  be  a  4x4  matrix.  These  changes  arc  arguably  beyond  the  scope  of 
any  current  optimizing  compiler  because  they  require  an  understanding  of  what  is  being  computed  by  die  source 
program  as  a  whole. 

If  the  programmer  had  written  the  program  as  shown  in  Fig.  18  then  any  good  optimizing  compiler  could  have 
produced  die  code  in  Fig.  17.  However,  it  is  implausible  diat  the  programmer  would  have  written  the  program  in 
a  form  anything  like  Fig.  18.  This  is  of  course  partly  due  to  the  fact  that  it  is  not  technically  possible  to  write  the 
program  shown  in  Fig.  18  in  Pascal.  However,  much  more  importantly,  it  is  not  desirable  to  write  programs  like 
Fig.  18.  The  programmer  should  not  have  to  worry  about  detailed  efficiency  in  die  source  code.  Rather, 
readability  should  be  the  primary  concern.  The  source  program  in  Fig.  16  is  preferable  lo  the  one  in  Fig.  18 
because  it  is  more  readable  and  therefore  easier  to  test,  verify,  and  maintain.  (One  might  argue  that  Fig.  16  would 
be  even  more  readable  if  it  operated  in  row  major  order.  However,  the  fact  that  it  operates  on  A  as  a  4x4  matrix 
clearly  makes  the  program  easier  to  understand  than  Fig.  18.) 

Design  of  Cobbler 

As  shown  in  Fig.  19,  the  architecture  of  Cobbler  is  similar  to  the  architecture  of  Saleh  (see  Fig.  12).  In 
particular,  the  first  three  stages  of  abstraction  —  parsing,  plan  creation,  and  grouping  —  are  identical,  and  arc 
intended  to  make  use  of  the  same  modules  of  KHHmacs.  The  difference  between  the  lengths  of  the  right  hand 
sides  of  Figs.  12  &  19  is  intended  to  indicate  diat  creating  an  efficient  PDP-11  implementation  of  an  abstract 
description  is  much  harder  than  creating  a  Hiboi  implementation. 


Kig.  19.  The  architecture  of  Cobbler. 


The  final  stage  of  abstraction  used  by  Cobbler  (algorithm  abstraction)  goes  beyond  the  algorithm  identification 
used  by  Satch.  The  goal  of  algorithm  abstraction  is  to  identify  the  various  design  decisions  which  were  used  when 
writing  the  Pascal  program  and  then  undo  them.  This  leads  to  a  hierarchy  of  abstract  descriptions  for  the  program 
which  arc  constrained  by  fewer  and  fewer  design  decisions. 

When  analyzing  the  program  in  Fig.  16.  the  algorithm  abstraction  module  first  withdraws  the  decision  to  use 
loops  when  operating  on  A.  Ihis  implicitly  withdraws  the  decision  to  iterate  in  column  major  order  as  opposed  to 
row  major  order.  It  then  withdraws  the  decision  to  set  the  diagonal  elements  to  zero  before  setting  them  to  one. 
Finally  it  withdraws  the  decision  to  implement  A  as  a  Pascal  array  as  opposed  to  a  non-contiguous  group  of 
variables.  All  of  these  steps  could  be  performed  by  recognizing  standard  algorithms  in  a  grouped  plan  for  Fig.  16. 

The  left  side  of  Fig.  20  summarizes  the  last  step  of  algorithm  abstraction.  The  4x4  description  represents  the 
net  effect  of  the  program  in  Fig.  16  on  the  Pascal  array  A.  The  abstract  description  represents  the  net  effect  of  die 
program  operating  directly  on  the  individual  matrix  elements.  1'hc  significance  of  the  abstract  description  is  that 
it  gives  Cobbler  the  freedom  to  consider  ways  of  accessing  A  other  than  as  a  4x4  array. 


(abstract  description) 

1000010000100001  ■=>  At  j  .  .  .  A4  4 
/  \ 

/  \ 

/  \ 

(4x4  description)  (3x5  description) 

1  0  0  0  1  0  0  0  0 
0  1  0  0  =)  A  10000  *>  A 

0  0  1  0  1  0  0  0  0 

0  0  0  1  1 


Fig.  ?0.  Some  descriptions  of  l  ie.  16  used  by  Cobbler. 
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the  abstract  description.  As  one  of  the  first  parts  of  the  rcimplenicntation  process.  Cobbler  looks  for  patterns  in 
the  abstract  description  in  order  to  decide  how  to  use  loops  in  the  output  program.  The  lecturing  pattern 
"1  0  0  0  0"  is  discovered.  This  causes  Cobbler  to  reorganize  its  understanding  of  the  program  into  die  3x5 
description  shown  on  the  right  hand  side  of  F  ig.  20. 

Once  the  3x5  description  has  been  created,  reimplcmcntation  proceeds  by  investigating  a  variety  of 
implementation  options  and  then  choosing  a  consistent  and  efficient  set  of  these  options.  Following  standard 
Pascal  practice,  tine  array  A  is  implemented  as  a  row  major  order  sequence  of  consecutive  bytes  in  memory.  ( I  his 
decision  has  to  take  the  other  uses  of  the  matrix  A  into  consideration.)  Hlcments  of  A  arc  addressed  by  stepping  a 
pointer  through  memory.  Since  the  inner  loop  which  zeros  tine  non-diagonal  elements  of  A  is  very  small  and  only 
iterates  four  times  it  is  unrolled  into  a  sequence  of  four  separate  instructions.  Clcar-bytc  instructions  arc  used  to 
zero  elements  of  A. 

The  key  difficulty  in  making  the  above  design  decisions  (and  the  other  decisions  which  are  required)  is 
controlling  the  search  process  which  investigates  the  various  options.  Flexibly  and  efficiently  controlling  search 
was  the  major  focus  of  Duffcy's  research.  Fie  proposed  the  following  approach  to  the  problem. 

A  data  base  is  used  to  represent  Cobbler's  evolving  understanding  of  the  implementation.  Design  decisions 
arc  represented  in  terms  of  transformations.  Kach  transformation  consists  of  a  pattern  and  a  procedural  body. 
Transformations  arc  triggered  (causing  their  bodies  to  be  executed)  when  their  patterns  match  portions  of  the  data 
base.  The  effect  of  a  transformation  is  to  modify  the  information  in  the  data  base,  or  add  new  information  to  the 
data  base. 

Ihc  key  component  of  the  knowledge-based  reimplcmcntation  module  is  a  conflict  resolution  monitor  which 
controls  the  triggering  of  transformations.  It  exercises  control  principally  by  deactivating  and  activating  groups  of 
transformations.  Associated  with  each  group  of  transformations  is  a  function  which  can  create  estimates  of  the 
costs  in  time  and  space  associated  with  the  design  decision  suggested  by  the  group  of  transformations.  (For  a 
discussion  of  one  way  in  which  such  estimates  can  be  computed  sec  [14] )  The  conflict  resolution  monitor  decides 
which  groups  of  transformations  to  activate  by  comparing  efficiency  estimates. 

An  important  feature  of  Cobbler  is  that  it  docs  not  assume  that  it  will  always  be  able  to  make  an  informed 
choice  between  the  design  decisions  it  is  faced  with.  In  order  to  deal  with  this  problem.  Cobbler  keeps  a  record  of 
the  design  decisions  which  were  used  in  the  source  program.  In  situations  where  Cobbler  is  not  able  to  make  an 
informed  choice,  it  uses  the  relevant  source  program  decision.  For  example,  if  no  pattern  had  been  found  in  die 
abstract  description.  Cobbler  would  have  used  the  4x4  structure  suggested  by  the  source  program. 

It  would  also  be  possible  for  Cobbler  to  take  advice  on  how  to  compile  a  program  because  Cobbler's 
processing  is  based  on  design  decisions  which  arc  comprehensible  to  a  programmer. 

I  hc  discussion  above  shows  how  Cobbler  is  intended  to  operate.  However.  Cobbler  is  not  a  running  system. 
With  the  exception  of  parts  of  the  reimplcmcntation  component,  no  attempt  has  been  made  to  implement 
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VI  -  The  Knowledge- litiscd  Translator 

Work  is  currently  underway  m  the  Programmer's  Api>rcmicc  project  on  the  unupoiK'nis  of  a  pcnei.ii  pm  pose, 
knowledge-based  translator  operating  via  abstraction  and  reimplcmcntation.  An  important  vutue  ol  this  system  is 
that  much  of  its  knowledge  of  translation  will  he  represented  as  data  rather  titan  procedures.  As  a  result,  it  will  be 
possible  to  readily  extend  the  system  to  cover  a  wide  range  of  source  and  target  languages. 

In  order  to  understand  how  die  knowledge-based  translator  will  operate,  it  is  first  necessary  to  discuss  two  of 
the  key  ideas  which  undcrly  the  Programmer's  Apprentice  (see  128]).  The  first  idea  is  the  concept  of  a  clre/n. 
Programs  are  not  constructed  out  of  arbitrary  combinations  of  primitive  programming  constructs.  Rather, 
programs  arc  built  up  by  combining  standard  computational  fragments  and  data  structure  fragments.  I  hese 
standard  fragments  arc  refet  red  to  as  cliches  and  form  the  heart  of  the  Programmer's  Apprentice's  understanding 
ot  piogramming,  just  as  they  form  the  heart  of  any  person's  understanding  of  programming. 

As  an  example  of  cliches,  consider  the  Cobol  program  PAYROLL  in  Pig.  10.  I  lus  program  contains  a  number  .>f 
cliches  which  can  he  named  and  described  as  follows.  The  data  cliche  keyed- si  quentnil-i  abnl-Jile  specifies  how  a 
series  of  records  with  keys  can  he  combined  into  a  tile.  T  he  computational  cliche  enumerate- keyed- sequential- 
('tdxd-file  enumerates  all  of  the  records  in  a  file  taking  care  of  opening  and  closing  the  file.  The  computational 
cliche  uccunuda re- keyed- sequential- Cobol-fdc  writes  out  a  series  of  records  into  a  file  taking  care  of  opening  and 
closing  the  file.  The  computational  cliche  Cobol- sum  computes  the  sum  of  a  sequence  of  numbers. 

A  crucial  feature  of  cliches  is  that  they  can  be  arranged  in  a  multi-lcs  cl  specialization  hierarchy  as  show  n  in 
Piu.  11.  The  descendants  of  a  cliche  in  diis  hierarchy  arc  more  specialized  cliches  which  specify  how  the  cliche 
sh  a:l.i  he  adapted  in  various  specific  situations,  l-'or  example,  there  is  an  abstract  cliche  enumerate  which  has  a  set 
ot  defendants  which  specify  how  to  enumerate  various  kinds  of  data  structures  (e.g..  enunierate-jlle  and 
ennmerate-vectur).  Similarly,  the  middle  level  cliche  cnumcrate-filc  has  a  set  of  descendants  winch  specify  how  to 
enumerate  different  types  of  files  (e.g.,  enumerate-  indexed- file  and  enumerate- keyed- sequential -tile).  Going  one 
step  further,  each  of  these  specific  file  enumeration  cliches  has  a  set  of  descendants  which  specify  exactly  wh.it 
hmct.ons  are  used  to  open,  close,  and  read  files  in  various  different  programming  language  environments  (e.g  . 
enumerate- indexed-  Ada- file  and  enumerate-  keyed- sequent  uil-(Hbol-fde). 


enumerate 

'  I 

/  |  ... 

•  I 

enumerate-f i le  enumerate-vector 

I  I 

/  I 

/  I 

<mumerate-indexed-file  enumerate-keyed-sequent.  ial-f  i'e 


;  man  te  -  i  ndoxed  Ada  -file  enimurate  key'd  sequential  Cnlml  file 
lip.  .’I.  I  v.impK's  i>l  sp. .  i.ili/.ii  I.  hi  icl  M"ii' lups  Ih  iwis  n  >  liches. 
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A  second  key  idea  which  underlies  the  Programmer's  Apprentice  is  the  plan  representation  which  was 
discussed  briefly  in  Section  l\.  The  most  important  feature  of  a  plan  is  that  it  is  an  abstract  representation  of  a 
program  which  captures  the  key  features  of  the  computation  while  ignoring  the  syntactic  details  of  particular 
programming  languages,  l  or  example,  data  flow  is  lepresented  by  simple  arcs  in  the  plan  for  a  program  no  matter 
how  it  is  implemented  in  the  program  (e.g..  via  variables  or  parameter  passing  or  nesting  of  expressions). 

Moth  Satch  and  Cobbler  make  use  of  the  version  of  the  plan  representation  w  hich  is  used  by  K  HI  macs.  Since 
the  design  of  those  systems.  Rich  [  1 7|.  1 1 H]  has  developed  an  extended  plan  representation  called  the  plan  calculus 
which  is  capable  of  representing  much  more  information  about  a  program.  In  particular,  the  plan  calculus  is 
capable  of  representing  data  cliches  and  the  specialization  relationships  between  cliches.  In  contrast,  the  plan 
representation  used  by  KUl  tnacs  is  only  capable  of  representing  computational  cliches  and  only  in  isolation  from 


each  other. 


Design  of  the  know/cil^c-  liasal  Translator 


l  ie.  22  shows  the  way  in  which  plans  and  cliches  can  be  used  as  the  basis  for  a  knowledge-based  translator 
operating  xia  abstraction  and  ^implementation.  The  modules  on  the  left  side  of  the  diagram  support  abstraction. 
The  modules  on  tire  right  side  of  the  diagram  support  ^implementation.  The  key  component  of  the  system  is  a 
library  of  cliches  like  the  ones  described  above.  Specialization  relationships  are  used  as  the  basis  for  the 
organization  of  the  library. 


•CLICHE  LIBRARY' 


RECOGNITION 


PLAN  CREATION 


Imia 

f 

CLICHE  / 
ABSTRACTION  / 

ource  cl Iche  p 

I T  ION ^ 

Plan 

■/ 


CLICHE 

SPECIALIZATION 


CODING 


PARSING 


lag  22.  translation  based  on  cliches  and  plans. 


Ihc  first  two  steps  of  abstraction  (parsing  and  plan  creation'  ire  exactly  the  same  as  in  Satch  and  Cobbler.  The 
last  two  steps  of  abstraction  (recognition  and  cliche  abstraction)  are  similar  to  Cobbler's  algonilim  abstraction 

•l.'.l-p  •  \  C  ,  It  a.  lie  ■■!  ihi '  e  mo  'iles  is  ihat  they  an-  data  dm  on  —  opera  im1’  based  t  n  ilv  i  In  C  •.  -loied  in  tbo 


I  he  iccognition  module  scans  the  grouped  pi. in  and  determines  what  setiue  language  didics  were  u-ed  to 
a  ti.Kt  the  source  program.  ( I  his  recognition  is  pcrlormcd  diicctiy  on  the  mu  lace  plan  and  th.eieloie  suhsumes 
:  ;  limping  performed  In  Saleh  and  Cobbler.)  When  applied  to  the  Coho  I  program  in  lie.  10.  recognition 
mid  reveal  that  lire  program  was  composed  of  cliches  such  as  keyed  set] uential-C'ohol- file,  ciuimciate- 
ed  sequential-Cobol  file,  accumulatc-keycd-scquential  ( oboMile.  CohoKount.  and  Cobol-sum. 

1  i.e  cliche  abstraction  module  creates  an  abstract  plan  by  replacing  specialized  plans  with  the  more  abstract 
>  ,  t he;,  arc  specializations  of.  In  the  example  above,  this  would  yield  a  plan  involving  the  abstract  cliches 
,  .'.-sequence,  enumerate,  accumulate,  count,  and  sum. 

:  lie  abstract  plan  attempts  not  to  force  any  design  decisions.  It  simply  states  that  there  are  certain  sequences  of 
.  out. lining  certain  data  values  and  keys  and  that  various  operations  aie  performed  on  these  values.  I  he 
■ . >11111  feature  of  the  abstract  plan  is  that  it  is  completely  neutral  between  the  (,'ohol  program  which 
ments  the  sequences  as  files  and  a  llibol  program  which  implements  them  as  (lows  or.  for  that  matter,  a  I  isp 
•c;  in  which  implements  them  as  lists. 

;  tie  rcimplcmeiuaiion  process  in  log.  22  operates  in  the  reverse  of  die  way  in  which  abstraction  operates, 
r.  in-  specialization  selects  cliches  which  specialize  the  cliches  in  the  abstract  plan  in  a  way  which  is  appropriate 
me  target  language.  Cliche  specialization  (which  can  be  looked  at  as  library  driven  synthesis)  is  the  inverse  of 
abstraction.  However,  it  is  more  difficult  than  cliche  abstraction  because  u  is  harder  to  make  design 
,  ions  than  to  discard  them. 

<  ding  creates  program  text  corresponding  to  die  specialized  cliches  which  arc  selected  by  cliche 
-.’.iz.ition.  Coding  is  the  inverse  of  parsing,  plan  creation,  and  recognition  Inverting  recognition  and  parsing 
•  a!  However,  inverting  plan  creation  is  difficult,  because  information  corresponding  to  die  information 
•  wa  away  by  plan  creation  must  be  generated,  for  example,  die  coding  module  has  to  decide  how  to  tender 
’.  flow  aesthetically  in  the  target  language  using  variables  and  nesting  of  expressions. 


,  ’  >ni  >Utng  the  Knowledge  Based  Translator 

a  cress  has  been  made  toward  implementing  most  of  the  components  in  f  ig.  22  However,  none  of  these 
■  ;  i"ierus  has  vet  been  completed.  Rich  and  Icldinan  are  currently  m  the  process  of  implementing  the  plan 
. i * s  together  with  a  gencrJ  purpose  automatic  deduction  system  [19]  to  snppoit  reasoning  in  it.  I  xtensive 
: 1  ms  already  been  done  on  designing  the  library  [  1 7 1. 

'  •  i > eti  a  particular  source  language,  it  is  not  difficult  to  implement  a  parsing  module.  As  mentioned  above,  the 
.  ication  module  already  exists  as  part  of  K  III  macs.  I  his  module  has  to  he  rewr alien  so  that  it  operates  in  the 
i  .  i  a  of 'the  plan  calculus.  However,  there  should  he  no  pai tieular  difficulty  in  doing  this, 
f  Ml  m. ics  also  contains  a  coding  module  analogous  to  the  one  needed  by  the  knowledge-based  tian-lator 
i  all  thcic  .ire  many  improvements  which  need  to  he  made  in  ibis  module  it  should  not  he  diffi.  u It  to 
;  !  .hi  .hi  adequate  codiiui  module  which  operates  m  the  com.  .i  ol  tin  plan  .  ,.K  ulus 
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should  he  straightforward.  Cliche  abstraction  is  driven  by  the  speciali/ation  links  in  the  cliche  library.  Cliche 
abstraction  is  particularly  easy  because  it  follows  these  links  in  the  inany-to-one  direction. 

There  has  also  been  no  attempt  :o  implement  the  cliche  spec lal i/ati .  n  module.  I  ike  cliche  abstraction,  cliche 
speciali/ation  is  dnvcn  by  die  specialization  links  in  the  cliche  library.  However,  cliche  specialization  is  harder 
titan  cliche  abstraction  because  numerous  design  decisions  have  to  be  made  when  choosing  a  path  through  the 
specialization  links  in  the  one  to  many  direction.  It  is  expected  that,  like  Cobbler,  the  cliche  specialization  module 
will  use  a  variety  of  estimates  and  heuristics  in  order  to  make  design  decisions.  Also  like  Cobbler,  design  decisions 
detected  during  cliche  abstraction  will  be  used  to  guide  cliche  specialization  in  situations  where  these  heuristics 
fail  to  be  applicable. 

In  many  ways,  the  central  module  in  Fig.  22  is  the  recognition  module.  Work  on  this  module  has  been 
underway  for  several  years.  Recognition  can  be  viewed  as  a  parsing  task.  From  this  viewpoint,  the  cliche  library 
is  a  grammar  wtiich  car.  be  used  to  derive  plans  for  programs.  In  order  to  determine  which  cliches  were  used  to 
construct  a  given  plan  one  needs  to  parse  the  plan.  This  would  be  a  straightforward  task  if  it  were  not  for  tire  fact 
diat  the  plan  for  a  program  is  a  graph  radicr  than  a  string,  and  cliche  instances  correspond  to  subgraphs  in  the 
plan  rather  than  substrings. 

As  a  first  step  toward  solv  ing  the  recognition  problem.  Brotsky  [6]  implemented  a  parser  which  is  able  to 
efficiently  parse  flow  graphs  (a  restricted  form  of  acyclic  directed  graph)  given  a  flow  graph  grammar.  Currently, 
/clinka  l29l  is  implementing  an  experimental  recognition  module  which  utilizes  this  graph  parser.  Further 
research  is  required  in  order  to  develop  effective  methods  whereby  the  knowledge-based  translator  can  deal  with 
incomplete  recognition. 

Once  the  implementation  of  the  components  described  above  has  been  completed,  it  w  ill  be  possible  to  use 
them  to  construct  a  general  purpose,  knowledge-based  translator.  As  mentioned  above,  a  key  feature  of  tins 
system  is  that  it  will  be  data  driven  with  most  of  its  knowledge  embedded  in  the  cliche  library.  Additional 
research  will  have  to  he  performed  in  order  to  discover  how  best  to  represent  die  heuristics  which  are  an  essential 
part  of  the  specialization  component  and  to  a  lesser  extent  of  the  coder  component. 


VII -Related  Work 

There  are  several  areas  where  active  work  is  in  progress  on  translators.  However,  essentially  all  current 
translators  operate  via  transliteration  and  refinement.  Some  translators  (eg.,  optimizing  compilers)  do  a 
significant  amount  of  global  analysis  of  die  source  program.  However,  it  is  not  clear  thai  any  program  translator 
uikes  die  step  of  attempting  to  obtain  an  abstract  understanding  of  die  computation  being  performed  by  the 
program  as  a  whole. 

rs 

v  ompileis  aic  the  most  comm,  n  example  of  translators.  They  have  been  well  developed  over  the  ve.us  and 
'••oik  «|uil>  \s  d.  si  m  test  bool  s  on  i  on  ipiiur.’.  (eg. Ill)  the  pioioty  picul  lompitei  ojviutcs  by 
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transliteration  and  rcfincinent.  The  source  language  is  li.insl iterated  (via  parsing  and  syntax  duelled  translation) 
into  an  intermediate  language  which  is  analogous  to  a  machine  language.  Refinements  (optimizations)  are  then 
applied  to  this  intermediate  representation.  Finally,  the  intermediate  language  is  transliterated  into  the  actual 
target  language.  The  current  developments  in  compiler  research  [30j  indicate  that  die  basic  approach  to 
compilation  outlined  above  is  still  adhered  to. 

I  lowcvcr,  over  the  years,  two  trends  in  compiler  research  have  been  mm  ing  in  the  direction  of  abstraction  and 
reimplemcntation.  One  trend  is  the  development  of  intermediate  representations  which  look  more  like  data  (low 
diagrams  and  less  like  particular  machine  languages.  These  more  abstract  representations  facilitate  the 
construction  of  families  of  compilers  w  hich  produce  output  for  a  variety  of  target  machines.  They  also  facilitate 
the  manipulation  of  the  program  when  optimizations  arc  being  applied.  In  particular,  they  makes  it  easier  to  keep 
track  of  the  data  Mow  in  a  program. 

Another  trend  is  toward  more  powerful  optimizations  which  require  a  greater  understanding  of  what  is  going 
on  in  a  program.  Classic  peephole  optimizations  such  as  locating  patterns  of  instructions  for  which  a  special  target 
instruction  is  available  operate  in  a  very  local  way  without  any  understanding  of  context.  More  powerful 
optimizations  such  as  removing  an  invariant  expression  from  a  loop  require  a  general  understanding  of  tire 
sin  rounding  data  flow  and  control  flow  .  Optimizations  such  as  strength  reduction  additionally  require  an 
understanding  of  the  mathematical  properties  of  the  basic  operators  (e.g..  "+"  and 

I  he  kind  of  analysis  which  underlies  complex  optimizations  is  a  sicp  toward  creating  an  abstract  summary  of 
the  program  being  compiled.  However,  it  is  only  a  small  step  in  this  direction  because  the  information  obtained 
by  analysis  is  not  very  abstract.  Ihc  only  abstraction  is  away  from  particular  data  flow  and  control  flow  constructs. 
In  addition,  the  analysis  is  narrow  in  scope,  aiming  only  to  gather  enough  information  to  answer  a  few  specific 
questions  about  the  program.  No  attempt  is  made  to  obtain  a  general  understanding  of  the  computation 
performed  by  the  program. 

Compiling  for  Parallel  Machines 

i  he  problem  of  compiling  a  conventional  programming  language  so  that  it  runs  efficiently  on  a  parallel 
machine  highlights  die  strengths  and  weaknesses  of  current  approaches  to  optimization.  Consider  compiling  the 
Fortran  program  fragment  in  Fig.  23  for  a  vector  machine.  I  lie  fragment  is  a  triply  nested  loop  which  computes 
the  product  of  two  NxN  matrices. 

DO  100  J  =  1,  N 
DO  100  I  *  1.  N 
DO  100  K  =  1,  N 

C ( I .  J )  *  c(i,j)+a(i,k)»b(k,j) 

100  CONTINUE 

Fig.  23.  I  .oops  performing  matrix  multiplication. 

Ihi’  loops  m  I  ip.  21  can  he  ellicicntly  executed  on  a  vcalar  machine  I  'uloi  ti  n.ilelv .  thee  c  urioi  'v  eli'k  k  mix 
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executed  mi  ihc  t\ pical  vector  machine.  The  problem  is  that  each  cycle  (afte;  the  first)  (it  the  innermost  loop  uses 
the  value  computed  on  the  prior  cycle  leaving  little  room  for  vcclori/ation.  However,  if  the  loops  are 
mtetchanced  so  that  the  K  loop  is  outermost,  then  they  can  he  efficiently  executed  on  a  vector  machine. 

I  he  discussion  in  |2|  shows  how  a  compiler  for  a  vector  machine  can  automatically  interchange  loops  in  order 
to  improve  the  efficiency  of  the  code  produced.  Interchanging  two  loops  changes  the  order  in  which 
computations  arc  performed.  Many  subcomputations  which  were  performed  in  die  order  SI  S2  before  the 
interchange  will  he  performed  in  the  order  S2  SI  after  die  interchange.  An  interchange  is  correctness  preserving 
as  long  as  nothing  in  the  original  program  either  requires  that  S2  follow  SI  or  prohibits  S2  from  preceding  SI 

A  global  analysis  of  die  loops  in  question  is  a  key  part  of  the  loop  interchange  optimization.  I  hc  compiler 
must  obtain  an  understanding  of  the  data  dependencies  between  array  elements  in  die  loops.  I  bis  requires  an 
understanding  of  the  data  flow  involving  die  arrays  (i.c..  A.  B,  and  C).  It  also  requires  at  least  a  partial 
understanding  of  the  interaction  between  the  loop  iteration  variables  and  die  index  expressions  which  select  array 
elements. 

In  Fig.  23,  the  index  expressions  arc  very  easy  to  understand.  However,  the  index  expressions  in  a  loop  can  be 
arbitrarily  complex.  F'or  example,  they  may  be  functions  of  the  input  data.  The  analysis  of  index  expressions 
used  by  die  loop  interchange  optimization  described  in  |2]  is  limited  to  situations  where  die  index  expressions  arc 
linear  functions  of  the  loop  iteration  variables. 

An  interesting  aspect  of  loop  interchange  in  particular ,  and  compiler  optimizations  in  general,  is  that  they  are 
deliberately  designed  to  be  narrow  in  scope  and  independent  of  whatever  computation  is  being  performed.  Ibis 
has  die  advantage  that  the  various  optimizations  can  be  applied  in  a  wide  variety  of  contexts  without  the  need  for 
any  special  knowledge  about  the  particular  algorithms  being  used.  However,  it  has  die  disadvantage  that  the 
optimizations  cannot  utilize  special  know  ledge  about  the  particular  algorithms  being  used. 

Given  die  algorithm  independent  nature  of  optimizations  in  general,  the  level  of  object  code  efficiency  which 
can  be  achieved  is  very  impressive.  However,  there  are  definite  limits  to  the  efficiency  which  can  he  achieved. 
For  example,  consider  compiling  Fig.  23  for  a  highly  parallel  machine  which  has  many  independent  processors. 
For  dtis  kind  of  machine,  optimizations  such  as  loop  interchange  are  not  sufficient  to  produce  efficient  code.  The 
problem  is  that  for  a  multiple  processor  machine,  the  standard  matrix  multiplication  algorithm  is  simply  the 
wrong  algorithm  to  use.  Special  algorithms  for  matrix  multiplication  have  been  developed  which  arc  much  more 
efficient  when  run  on  a  multiple  processor  machine. 

In  order  to  create  really  good  code  for  a  multiple  processor  machine  a  compiler  would  have  to  recognize  that 
matrix  multiplication  was  being  performed  in  Fig.  23  and  then  replace  the  standard  algorithm  with  one  of  its 
multiple  processor  counterparts.  The  lack  of  compilers  which  can  make  dtis  kind  of  transformation  significantly 
limits  the  usability  of  multiple  processor  machines.  In  order  to  make  full  use  of  these  machines,  programmers 
have  to  rewrite  their  programs  in  special  languages  using  new  algorithms. 
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A  iiurd  category  of  compilers  is  compilers  lor  gencr.il  purpose  very  high  level  languages.  A  numher  of  such 
languages  have  been  designed  (e.g.,  SHTI.I22).  Gist  111.  and  V  1 1 2]).  I  hese  languages  dil.er  from  high  level 
languages  in  that  they  are  more  abstract.  A  good  example  of  this  difference  is  the  treatment  of  data  structures. 

High  level  languages  provide  facilities  so  tb.it  die  programmer  can  specify  die  exact  details  of  how  data  structures 
■UHHild  be  implemented.  In  contrast,  very  high  level  languages  typically  support  only  a  few  universal  data 
structures  such  as  sets  and  mappings.  All  decisions  about  how  to  implement  a  given  set  or  mapping  efficiently  arc 
iett  up  to  the  very  high  level  language  compiler.  This  simplifies  what  the  programmer  has  to  do  by  remov  ing  large 
p.u  t'-  of  the  programming  task  from  consideration. 

I  nlbrmnalely.  constructing  a  compiler  for  a  general  purpose  very  high  level  language  which  produces  efficient 
object  code  has  proved  very  difficult.  While  diese  compilers  are  the  subject  of  active  research,  it  is  not  clear  that 
snJi  a  compiler  can  be  said  to  exist  even  in  a  research  setting. 

I'lie  SHII.  compiler  111  I  is  implemented  more  or  less  along  traditional  lines  with  the  addition  of  a  special 
component  which  selects  data  structure  implementations.  However,  the  key  technique  which  is  being  pursued  as 
a  b  i  as  for  very  high  level  language  compilers  is  refinement  through  transformation  [l|,  ( 1 2],  In  diis  approach  a 
very  high  level  language  source  program  is  progressively  refined  into  an  efficient  target  program  by  applying  a 
sequence  of  correctness-preserving  transformations.  The  net  effect  of  the  transformations  is  to  replace  all  of  the  • 
abstract  concepts  (e.g.,  set)  in  the  source  with  concrete  concepts  (c.g..  record  or  array)  in  die  target.  The  key 
problem  (which  has  so  far  resisted  solution)  is  that  there  arc  a  vast  number  of  ways  in  which  a  source  program  can 
!v  u mslbmicd  and  it  is  very  hard  to  decide  which  ones  will  lead  to  acceptably  efficient  results. 

Refinement  through  transformation  is  basically  an  example  of  the  transliteration  and  refinement  approach;  or 
rathei  just  refinement.  Using  transformations  has  several  advantages.  Jn  particular,  each  transformation  typically 
embodies  a  single  implementation  decision  and  is  straightforward  to  understand  in  isolation,  further,  since  each 
transformation  is  correctness-preserving  it  is  clear  that  the  result  produced  will  be  correct. 

What  is  lacking  in  die  transformational  approach  is  a  general  strategy  for  making  overall  design  decisions.  It  is 
ii  ;  dear  diat  it  is  possible  to  make  these  decisions  on  a  local  basis  as  individual  transformations  arc  applied.  One 
al’en'.ite  approach  would  be  to  pursue  all  of  :hc  major  choices,  compiling  a  given  program  many  different  ways 
.i  >d  'hen  pick  the  implcmcntadon  which  is  best  [14],  However,  it  is  not  clear  that  this  approach  can  be  practically 
_d  to  complex  programs  where  large  numbers  of  choices  have  to  be  made. 

'  i. other  approach  which  has  not  yet  been  tried  would  he  to  use  abstraction  and  rcunpleincntation  as  the  basis 
f<  i  i  huicc.  The  goal  would  be  to  recognize  patterns  of  computation  in  the  source  program  which  suggest  that 
pmkd.ir  design  choices  should  be  used.  A  strategy  would  still  be  required  for  selecting  between  conflicting 
si  r:  •-•lions.  However,  this  strategy  could  benefit  from  having  a  high  level  description  of  the  conflict. 
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A  number  of  source-to-sourcc  program  translators  exist.  However,  as  a  group,  they  arc  not  as  well  developed 
as  compilers  and  relatively  little  has  appeared  in  the  literature  about  them.  It  seems  that  all  current 
source-to-sourcc  translators  operate  via  transliteration  and  refinement  doing  relatively  little  refinement. 

Unfortunately,  sourcc-lo-sourcc  translators  tend  to  be  incomplete  and  incorrect.  Most  of  them  handle  only 
pari  (around  90%)  of  the  source  language.  Further,  relatively  few  source-to-sourcc  translators  correctly  handle  the 
sub-language  they  arc  applicable  to. 

As  discussed  in  Section  II,  both  of  these  problems  stem  from  difficulties  in  transliteration.  Source  language 
constructs  which  cannot  be  reasonably  transliterated  arc  not  supported.  Further,  transliteration  methods  which 
work  most  of  the  time,  but  not  all  of  the  time,  are  used  as  if  they  worked  all  of  the  time. 

In  addition  to  the  problems  above,  when  measured  by  die  criteria  of  readability,  tire  output  of  most  translators 
is  not  particularly  good.  Although  serviceable,  the  output  produced  seldom  comes  anywhere  near  the  goal  of 
being  what  the  programmers  would  have  written  had  they  been  writing  in  die  target  language. 

Due  to  the  difficulties  above,  it  is  not  accurate  to  refer  to  typical  sourcc-to-sourcc  translation  systems  as 
automatic  systems.  It  is  more  accurate  to  describe  them  as  human- assisted  translation  systems.  In  order  to  obtain 
correct  (let  alone  aesthetic)  output,  human  intervention  is  usually  required.  The  user  has  to  edit  the  source 
program  (to  remove  untranslatable  constructs)  and/or  the  target  program  (to  correct  errors  and  improve  the 
translation). 

As  a  straightforward  example  of  a  translator,  consider  the  Lisp  1.6  to  Interlisp  translator  implemented  by 
Samel  [21],  This  translator  operates  purely  by  transliteration.  It  does  no  refinement.  Although  reasonably 
efficient  output  is  produced,  the  translator  makes  no  attempt  to  create  aesthetic  output.  In  particular,  there  is  no 
attempt  to  create  Intcrlisp-style  output.  Rather,  a  set  of  functions  is  defined  in  Jntcrlisp  which,  as  much  as 
possible,  allows  intcrlisp  to  simulate  Lisp  1.6.  For  example,  instead  of  translating  the  source  program  into 
Intcrlisp  syntax,  the  Intcrlisp  reader  is  modified  so  that  it  can  read  in  a  program  in  Lisp  1.6  syntax.  In  [21],  Samct 
identifies  a  number  of  features  of  Lisp  1.6  which  his  translator  cannot  handle.  The  user  is  required  to  edit  the 
source  program  in  order  to  eliminate  these  features.  Samct  also  describes  several  features  of  I  isp  1.6  which  arc 
translated  in  ways  which  arc  often,  but  not  always,  correct.  The  translation  produced  has  to  be  carefully  tested  in 
order  to  check  that  these  over-simple  transliterations  have  not  led  to  any  problems. 

At  first  glance,  it  might  appear  that  translation  between  two  dialects  of  Lisp  should  be  easy.  However,  this  is 
not  the  case.  In  fact.  Lisp  supports  a  number  of  features  which  are  spectacularly  difficult  to  translate.  For 
example,  a  Lisp  program  can  const  met  a  new  Lisp  program  and  then  execute  this  new  program.  Consider  a 
I  isp  1.6  program  which  constructs  a  Lisp  1.6  program  and  then  calls  it  as  a  subroutine.  The  program  would  have 
to  be  translated  into  an  Intcrlisp  program  which  constructs  an  Intcrlisp  program.  It  is  very  unlikely  that  this  kind 
ol  translation  could  be  performed  without  using  abstraction  and  rcimplcmentation  of  the  most  powerful  kind. 

Violhci  Nli.ivhlfiu  v.ird  Iran  I  aim  is  the  Lor!  ran  to  I  i,p  ti.msl.itm  im  plain  nted  In  I’ltm.ai  lilt  I  I  ike  Samel's 
U  at  ■  d  Kin.  tills  iiaiisl.il' m  op. i. lies  piaely  In  liauslitei  itim  ,  doing  no  i  ^  1 1  lie'll  ten  l .  I  he  u.m-.l.itm  hu.es 
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readable  output.  I  lowovcr,  it  deliberately  attempts  to  create  Forir.m-siy  le  output  as  opposed  to  I  isp-style  output. 
The  translation  is  supported  by  a  set  of  functions  which  allow  l  isp  to  simulate  the  f  ortran  runtime  environment. 
This  approach  introduces  a  significant  overhead  which  causes  a  translated  program  to  run  several  times  slower 
than  the  f  ortran  source  program.  Pitman's  translator  is  far  superior  to  Samel's  translator  in  tliat,  except  for  one  or 
two  very  obscure  features,  all  of  the  features  of  Fortran  are  translated  correctly  all  of  die  time. 

A  third  translator  in  Uiis  vein  is  the  Fortran  to  Jovial  translator  implemented  by  Boxer  |4],  l  ike  the  translators 
above,  it  operates  purely  by  transliteration.  The  output  of  the  translator  is  not  intended  for  human  consumption 
and  no  attempt  is  made  to  make  it  particularly  readable  or  to  render  it  in  Jovial-style.  (The  examples  in  |4| 
indicate  that  the  output  is  similar  in  style  to  the  Ada  shown  in  Fig.  3.)  The  translator  only  handles  a  subset  of 
Fortran.  It  succeeds  in  translating  from  90%  to  100%  of  the  typical  input  module.  User  intervention  is  required  to 
complete  the  translation. 

The  l.isp  to  Fortran  translator  developed  by  Boyle  [5]  is  interesting  because  it  is  based  on  die  transformational 
approach  discussed  in  the  last  subsection.  The  translator  handles  an  applicative  subset  of  I  isp  which  docs  not 
include  such  hard  to  translate  features  as  the  ability  to  create  and  execute  new  I  isp  code.  Readability  is  not  a  goal 
of  the  translation.  Rather,  readability  of  the  output  is  abandoned  in  favor  of  producing  reasonably  efficient 
Fortran  code.  As  discussed  in  [5],  this  translator  is  perhaps  best  thought  of  as  a  compiler  of  Lisp  into  Fortran 
rather  than  a  sourcc-to-source  translator. 

Boyle's  translator  operates  by  transliterating  the  Lisp  source  into  an  extension  of  Fortran  and  then 
transforming  this  extended  Fortran  into  ordinary  Fortran.  The  transformation  process  is  controlled  by  dividing  it 
into  a  number  of  phases.  Fiach  phase  applies  transformations  selected  from  a  small  set.  The  transformations 
within  each  set  arc  chosen  so  Uiat  conflicts  between  transformations  will  not  arise. 

Boyle's  translator  is  successful  not  because  it  has  solved  the  problems  faced  by  very  high  level  language 
compilers,  but  rather  because  it  succeeds  in  avoiding  them.  First,  compared  to  SFl'L.  Gist,  and  V,  Lisp  is  not  very 
ab-.tr, ict.  Therefore  there  arc  fewer  complex  design  decisions  which  have  to  be  made.  Second,  the  design 
d<  .  c  ions  arc  small  enough  in  number  that  it  is  possible  to  find  a  fixed  set  of  choices  w  hich  wmrks  reasonably  well 
for  a!!  of  the  Lisp  programs  being  translated.  These  fixed  choices  are  embedded  in  the  translator  through  the 
choic  <  of  phases  and  transformations.  Lists  arc  always  implemented  the  same  way.  Recursion  is  always  simulated 
in  the  same  way.  I  his  leads  to  the  production  of  Fortran  programs  which  are  reasonably  efficient,  but  typically 
far  from  optimally  efficient 
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In  addition  to  the  in-house  translation  systems  described  above,  a  number  of  translators  arc  commctcially 
a  ail  ble.  One  area  where  several  translators  are  available  is  translating  between  assembler  languages  for  various 
mi  r  |  mcessors.  I  he  discussion  in  |25|  compares  three  commercial  available  translators  between  8080  assembler 
an  1  1 1 assembler.  An  in-house  attempt  at  a  translator  between  /SO  assembler  and  MChSOV*  av>emhler  is 
d  1 : ■  1 23 1  All  bun  tiuiisluims  opeute  prim. mb,  Ik  u.it.sliteratinn  on  ,.;i  , m  i : hi  linn  b.  instruction  basis 
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and  do  little  or  no  refinement.  They  all  operate  on  only  a  subset  of  the  source  language  and  use  simplistic 

a.  '  *  * 

*  *  * 

transliterations  which  arc  not  correct  in  all  contexts.  Human  intervention  is  often  required  in  order  to  obtain 
correct  output.  The  translations  produced  arc  also  quite  inefficient,  consisting  of  from  3  to  ft  times  as  many 
instructions  as  the  source.  One  of  the  8080  to  808ft  translators  (XI  .186  from  Digital  Research  Inc.)  uses  global 
data  and  control  flow  analysis  in  order  to  guide  the  choice  of  transliteration  for  instructions.  It  produces  output 
which  is  significantly  more  efficient  and  more  often  correct  than  the  other  translators. 

Another  area  where  a  number  of  translators  arc  available  is  translating  between  various  languages  used  for 
business  data  processing  (c.g.,  Cobol,  RPGII,  and  PI  ./I).  Numerous  translators  exist  (for  example,  see  [31],  |32l). 
Substantive  information  about  die  internal  operation  of  these  translators  is  hard  to  obtain,  however  several  things 
arc  clear  from  their  external  descriptions.  They  do  not  handle  the  whole  source  language.  In  general,  Urey  only 
succeed  in  translating  90%  to  95%  of  typical  source  programs.  They  do  not  always  produce  correct  output. 
(In  [32],  die  user  is  specifically  instructed  to  test  and  debug  the  translations  produced.)  Kxamplcs  suggest  that  the 
output  is  not  particularly  readable,  and  dial  the  output  was  probably  created  primarily  through  transliteration. 


Code  Restructuring 


An  interesting  subcategory  of  sourcc-to-sourcc  translators  is  systems  which  translate  a  program  from  a 
language  back  into  the  same  language.  The  goal  of  these  systems  is  to  create  output  which  is  more  readable  than 
the  input.  In  particular,  diese  systems  typically  seek  to  render  unstructured  source  programs  in  a  structured  form. 
Given  that  the  source  and  target  languages  are  the  same,  it  is  a  relatively  straightforward  matter  to  make  sure  that 
die  entire  source  language  is  handled  correctly.  However,  it  is  far  from  straightforward  to  produce  output  which 
really  is  significantly  more  readable  dtan  the  input.  Many  of  these  systems  arc  little  more  dian  pretty  printers  and 
are  of  marginal  use.  However,  at  least  one  system  (Rccodcr[7])  is  a  true  translator  and  creates  highly  structured 
output. 

Rccoder  operates  on  Cobol  programs  in  three  stages.  The  first  stage  creates  a  flow  chart-like  graph 
representing  the  source  program.  The  key  feature  of  the  graph  representation  is  dial  all  control  flow  is 


represented  by  explicit  arcs  which  arc  independent  of  the  Cobol  constructs  which  were  originally  used  to 
implement  the  control  flow.  The  second  stage  applies  correctness-preserving  transformations  to  die  graph  in 
order  to  rearrange  the  graph  into  a  structured  form.  The  third  stage  creates  a  new  Cobol  program  based  on  die 
rearranged  graph. 

Rccoder  represents  a  step  toward  the  absti  .aion  and  rcimplemcntation  approach  because  the  abstraction 
which  it  uses  is  clearly  die  driving  force  behind  the  translation.  However,  the  step  is  strictly  limned  by  a  number 
of  factors.  The  graph  representation  used  is  not  very  abstract.  The  only  abstraction  is  away  from  particular 
control  flow  constructs.  No  attempt  is  made  to  rccogni/.c  the  algorithms  being  used  in  the  source  program  or  to 
abstract  away  from  them. 
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\atuial  I  alienage  Translation 


\n  interesting  area  which  is  closely  related  to  program  translation  is  natural  language  translation.  Work  on 
n.ttui.il  language  translation  started  by  using  transliteration  and.  in  a  quest  lor  high  quality  output,  is  now  moving 
in  the  direction  of  translation  via  abstraction  and  rcimplementalion. 


Vmosi  all  of  die  natural  language  translation  systems  which  are  in  actual  regular  use  today  operate  via 
transliteration  and  refinement  (sec  [26]),  In  general.  Uicse  systems  produce  output  which  is  very  rough,  but  which 


is  readable  to  a  person  who  is  familiar  with  the  subject  area.  A  good  example  of  such  a  system  is  die  Hallo 
system  |2(t|  which  translates  from  Spanish  to  Kngtish. 


Paho  operates  by  transliterating  die  source  text  on  a  sentence  by  sentence  basis.  1'his  transliteration  is  carried 
out  for  die  most  part  on  a  word  by  word  basis  with  a  small  amount  of  inter  word  analysis  to  take  care  of  issues 


such  as  providing  correct  translations  for  idioms,  and  rearranging  die  adjectives  in  a  noun  phrase.  (Adjectives 
follow  nouns  in  Spanish  whereas  they  precede  nouns  in  Kngtish).  The  practicality  of  this  kind  of  transliteration 


depends  heavily  on  a  number  of  convenient  correspondences  between  the  basic  structure  of  Spanish  and  Hnglish 
(c.g..  the  near  identieulily  of  word  order,  and  the  fact  that  Spanish  pronouns  are  more  heavily  marked  for  gender 
than  l-.nglish  pronouns). 

Paho  is  not  capable  of  refining  the  Hnglish  it  produces.  Manual  post-editing  is  required  in  order  to  generate  an 
acceptable  translation.  The  biggest  weaknesses  of  Paho  ts  that  it  knows  very  little  about  syntax  and  nothing  about  • 

the  meaning  of  the  sentences  being  translated.  Hurdler,  it  has  no  knowledge  of  interactions  between  sentences. 

In  the  quest  for  higher  quality  translations  dian  the  ones  generated  by  systems  like  Paho,  translators  are  now- 
being  developed  which  operate  more  in  die  vein  of  abstraction  and  rcimplcmentation.  A  good  example  of  such  a 
translator  is  the  Kurotra  system  [15]  which  is  currently  being  developed  to  translate  between  die  major  western 
Kuropvan  languages.  Kurotra  uses  semantically  annotated  syntactic  parse  trees  as  an  abstract  representation  for 
the  sentences  being  translated.  Analysis  (abstraction)  and  synthesis  (rcimplcmentation)  components  convert 
source  languages  into  parse  trees  and  parse  trees  into  target  languages  respectively. 

Kurotra  is  not  a  true  abstraction  and  rcimplcmentation  system  because  the  annotated  parse  trees  arc  not 
independent  of  the  source  and  target  languages.  Procedural  transfer  components  arc  required  in  order  to  convert  a 
souri  e  language  specific  parse  tree  into  a  target  language  parse  tree. 

h  is  expected  dial  Kurotra  will  produce  significantly  belter  output  dian  Paho.  However,  it  is  expected  that 
1  uiotia  will  still  fall  short  of  high  quality  translation,  in  particular,  although  Kurotra  has  much  more  syntactic 
understanding  than  Paho.  its  semantic  and  intcr-sciiienti.ii  understanding  is  still  quite  weak. 

In  order  to  achieve  high  quality  translation,  natural  language  translation  systems  have  to  be  able  to  obtain  an 
in-depth  understanding  of  the  text  being  translated.  One  approach  to  this  is  the  recent  work  on  knowledge-based 
mu  June  translation  (see  |8l).  This  work  has  succeeded  in  demonstrating  natural  language  translation  via 
ahst i , ;  imn  a nd  rcimplcmcnt.ition.  The  abstract  dcsciiption  used  by  this  approach  is  a  language  independent  . 
i  t  i  ■, -in. in, 'ii  of  die  coiu a  ptual  .lepeiuk  ik i.-s  in  the  text  Knowledge-ha-  d  i  i.vhme  tr.insl.iiinii  is  iniemkd  to 
p  i1  In  lust  an.il .  /me,  die  enure  snim.v  u  l  in  oi.ici  lu  ikmi  in  me*  i  -  iik.i'mia  .md  then  nv'.pio-  in.  ibis 
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meaning  in  the  target  language  using  the  syntactic  structure  of  the  source  as  a  guide  for  what  to  say  when. 

Although  knowledge-based  machine  translation  holds  llie  promise  of  generating  very  Inch  quality  output, 
more  work  has  to  be  done  before  a  translator  following  this  approach  will  be  practical.  In  particular,  as  with 
translation  via  abstraction  and  rcimplcinentation  in  general,  dicre  is  a  significant  problem  with  incompleteness. 
Considerable  further  research  has  to  be  done  before  it  will  be  possible  to  achieve  anywhere  near  a  complete 
understanding  of  arbitrary  passages  of  source  text.  However,  perfection  is  not  required.  Human  translators  arc 
unable  to  translate  technical  texts  unless  they  understand  the  technical  area  being  discussed. 
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